Professional Documents
Culture Documents
On Quantum Statistical Inference - Ole E. Barndorff-Nielsen - 2003
On Quantum Statistical Inference - Ole E. Barndorff-Nielsen - 2003
B (2003)
65, Part 4, pp. 775–816
Ole E. Barndorff-Nielsen,
University of Aarhus, Denmark
Richard D. Gill
University of Utrecht and EURANDOM, Eindhoven, the Netherlands
[Read before The Royal Statistical Society at a meeting organized by the Research Section
on Wednesday , May 21st , 2003, Professor D. Firth in the Chair ]
1. Introduction
Quantum mechanics has replaced classical (Newtonian) mechanics as the basic paradigm for
physics. From there it pervades chemistry, molecular biology, astronomy, cosmology, . . . . The
theory is fundamentally stochastic: the predictions of quantum mechanics are probabilistic.
When used to derive properties of matter, the stochastic nature of the theory is typically swal-
lowed up by the law of large numbers (very large numbers, like Avogadro’s, 1023 ). However, in
some situations randomness does appear on the surface, most familiarly in the random times of
clicks of a Geiger counter. Present-day physicists, challenged by the fantastic theoretical promise
of a quantum computer, are carrying out experiments in which half a dozen ions are held in an
ion trap and individually pushed into lower or higher energy states, and into quantum super-
positions of such joint states. The existence of these wavelike superpositions of combinations of
distinct states of distinct objects is a fundamentally quantum phenomenon called entanglement.
Entanglement is of enormous importance in quantum computation and quantum communica-
tion. In other experiments, using supercooled electric circuits, billions of electrons behave as a
single quantum particle which is brought into a wavelike superposition of macroscopically dis-
tinct states (clockwise and anticlockwise current flow, for instance). This was recently achieved
Address for correspondence: Richard D. Gill, Mathematical Institute, University of Utrecht, Box 80010, 3508
TA Utrecht, The Netherlands.
E-mail: Richard.Gill@math.uu.nl
1.1. Overview
The paper is organized as follows. Section 2 describes the mathematical structure linking states
of a quantum system, possible measurements on that system, and the resulting state of the system
Quantum Statistical Inference 777
after measurement. Section 3 introduces quantum statistical models and notions of quantum
score and quantum information, parallel to the score function and Fisher information of clas-
sical parametric statistical models. In Section 4 we introduce quantum exponential models and
quantum transformation models, again forming a parallel with fundamental classes of models
in classical statistics. In Section 5 we describe notions of quantum exhaustivity, quantum suffi-
ciency and quantum cuts of a measurement, relating them to the classical notions of sufficiency
and ancillarity. We next turn, in Section 6, to a study of the relationship between quantum infor-
mation and classical Fisher information, in particular through Cramér–Rao-type information
bounds. In Section 7 we discuss the interrelation between classical and quantum probability and
statistics. Finally, in Section 8 we conclude with remarks on further topics of potential interest
to probabilists and statisticians. Sections 4, 5 and 6 contain a considerable amount of new work.
This paper complements our more mathematical survey (Barndorff-Nielsen et al., 2001a)
on quantum statistical information. A version of this paper with much further material (such
as foundational questions, Bell inequalities, infinite dimensional spaces and continuous time
observation of a quantum system) is available as Barndorff-Nielsen et al. (2001b). Many further
details can be found in Barndorff-Nielsen et al. (2003). Gill (2001a) is a tutorial introduction
to the basic modelling, and Gill (2001b) is an introduction to large sample quantum estima-
tion theory. Some general references which we have found to be extremely useful are Isham
(1995), Peres (1995), Gilmore (1994) and Holevo (1982, 2001a). The reader is also referred
to the ‘bible of quantum information’ (Nielsen and Chuang, 2000), which contains excellent
introductory material on the physics and the computer science, and to the basic probability
and statistics text Williams (2001) which recognizes (chapter 10) quantum probability as a topic
which should be in every statistician’s basic education. Finally, the former Los Alamos National
Laboratory preprint service for quantum physics, now at Cornell University, quant-ph at
http://arXiv.org, is an invaluable resource.
Note that ρi = |ii| is a d × d matrix of rank 1. It represents the operator which projects an
arbitrary vector into the one-dimensional subspace of H consisting of all (complex) multiples
of |i. We can easily check that it is a density matrix. It follows that ρ, a probability mixture
of density matrices, is also a density matrix. By the eigenvalue–eigenvector decomposition of
self-adjoint matrices, any density matrix can be written in the form (1), with the vectors |i
orthonormal.
If a density matrix ρ is of rank 1, we can write ρ = |φφ| for some vector |φ with φ2 = φ|φ =
1. The state is called a pure state and |φ is called the state vector; it is unique up to a com-
plex factor of modulus 1. If the rank of a density matrix is greater than 1 then the state
is called mixed. It can be written as a mixture of pure states in many different ways,
especially if we do not insist that the state vectors of the pure states are orthogonal to one
another.
The density matrix of a quantum system encapsulates in a very concise but rather abstract
way all the predictions that we can make about future observations on that system or, more
generally, all results of interaction of the system with the real world.
So far we have been using the word ‘measurement’ in a rather loose way, but at this point it is
important to make the technical distinction between mathematical models for a measurement
when we do not care about the state of the system after the measurement, but only about the
outcome, and models for a measurement including the state of the system after the measure-
ment. The former is called a measurement and is denoted by M; the latter, more complicated
object, is called an instrument and is denoted by N.
Let us start with the simpler object, a measurement. Consider a measurement with discrete
outcome, i.e. the sample space of the outcome is at most countable. From quantum theory it
follows that any measurement whatsoever, i.e. any experimental set-up, is described mathemati-
cally by a collection M of d × d matrices m.x/ indexed by the outcomes x of the experiment. The
Quantum Statistical Inference 779
matrices must be non-negative (and hence also self-adjoint) and must add up to the identity
matrix 1. Let us write p.x; ρ, M/ for the probability that applying the measurement M to the
state ρ produces the outcome x. Then we have the fundamental formula
We can see that this expression indeed defines a bona fide probability distribution as follows.
Writing ρ = Σ pi |φi φi | and permuting cyclicly the elements in a trace of a product of matrices,
we find
tr{ρ m.x/} = pi tr{|φi φi |m.x/} = pi φi |m.x/|φi :
i
Thus, since m.x/ is a non-negative matrix and the pi are probabilities, p.x; ρ, M/ is a non-
negative real number. Moreover, the sum over x of these numbers is
tr{ρ m.x/} = tr{ρ m.x/} = tr.ρ1/ = tr.ρ/ = 1:
x x
A quantum statistical model is a model for a partly or completely unknown state. This means
that the state ρ is supposed to depend on an unknown parameter θ in some parameter space
Θ. Write ρ = .ρ.θ/ : θ ∈ Θ/. When we apply a measurement M to a quantum system from this
model, the outcome has probability density
Thus, given the measurement and the quantum statistical model, a classical statistical inference
problem is defined. Very important problems also arise when the measurement itself is indexed
by an unknown parameter, but for brevity we do not address these here.
In principle, any measurement M whatever could be implemented as a laboratory experiment.
Equation (3) tells us implicitly how much information about θ can be obtained from a given
experimental set-up M. We may try to choose M in such a way as to maximize the information
which the experiment will give about θ. Such experimental design problems are a main subject
of this paper.
Often we are interested also in the state of the system after the measurement. In this case we
need the more general notion of an instrument. An instrument N (more precisely, a ‘completely
positive instrument’) is represented by a family of collections of d × d matrices ni .x/ satisfying
ni .x/Å ni .x/ = 1
x i
but being otherwise completely arbitrary. The index x refers to the observed outcome of the
measurement; the index i could be thought of as ‘missing data’. Define
m.x/ = ni .x/Å ni .x/:
i
It follows that the matrices m.x/ are non-negative (and self-adjoint) and add to the identity ma-
trix, and thus represent a measurement (in the narrow or technical sense) M. When we apply the
instrument N to the quantum system, the outcome has the same probability density as equation
(2), but we write it out in terms of N as
p.x; ρ, N / = tr{ρ ni .x/Å ni .x/} .4/
i
780 O. E. Barndorff-Nielsen, R. D. Gill and P. E. Jupp
and the state of the system after applying the measurement, conditioned on observing the out-
come x, is
ni .x/ρ ni .x/Å
i
σ.x; ρ, N / = : .5/
tr{ρ ni .x/Å ni .x/}
i
The reader should check that the expression for σ.x; ρ, N/ does define a bona fide density matrix
(non-negative, trace 1). In some important practical problems the instrument itself depends on
an unknown parameter, but here we suppose that it is completely known.
It follows from quantum physics that whatever we can do to a quantum system must have the
form of a quantum instrument. Moreover, in principle, any quantum instrument whatsoever
could be realized by some experimental set-up. Usually in the theory we start by postulating
some natural physical properties of the transformation from input or prior state to output or
posterior state and data, and derive equations (4) and (5), which are then called the Kraus rep-
resentation of the instrument, as a theorem. Here it is more convenient to start with equations
(4) and (5). Further discussion and references are given in Section 2.7.
We could consider applying two different quantum instruments, one after the other, to the
same quantum system. We might even allow the choice of the second instrument to depend on
the outcome that is obtained from the first. The composition of two instruments in this way
defines a new one; it is not difficult to express the matrices ni .x/ of the new instrument in terms
of those of the old ones. Another way to obtain new instruments from old is by coarsening. Sup-
pose that we apply one instrument to a quantum system, then we apply a many-to-one function
of the outcome, and discard the original data. The new instrument can be written down in terms
of the old by relabelling the matrices ni .x/ with new index j and variable y in obvious fashion.
In classical statistics, central notions such as sufficiency are connected to decomposing sta-
tistical models into parts (marginal and conditional distributions), and to reducing statisti-
cal models by reducing data. Starting with a quantum statistical model with density matrices
ρ depending on a parameter θ, possibly with nuisance parameters also, it is now natural to
ask whether notions akin to sufficiency and ancillarity can be developed for instruments. For
instance, it might happen that the posterior state of a quantum system after applying a certain
instrument no longer depends on the unknown parameter.
In Section 2.2 we shall work out many of these notions for the important special case
of a two-dimensional quantum system. First we present two special examples, connecting the
notion of an instrument to the classical notions (in quantum physics) of observables and unitary
transformations.
Any pure state has the form |ψψ| for some unit vector |ψ in C2 . Up to a complex factor of
modulus 1 (the phase, which does not influence the state), we can write |ψ as
exp.−iφ=2/ cos.θ=2/
|ψ = :
exp.iφ=2/ sin.θ=2/
The corresponding pure state is
cos2 .θ=2/ exp.−iφ/ cos.θ=2/ sin.θ=2/
ρ= :
exp.iφ/ cos.θ=2/ sin.θ=2/ sin2 .θ=2/
784 O. E. Barndorff-Nielsen, R. D. Gill and P. E. Jupp
A little algebra shows that ρ can be written as ρ = .1 + ux σx + uy σy + uz σz /=2 = 21 .1 + u · σ /,
where σ = .σx , σy , σz / are the three Pauli spin matrices and u = .ux , uy , uz / = u .θ, φ/ is the point
on the unit sphere with polar co-ordinates .θ, φ/.
An arbitrary mixed state is obtained by averaging pure states ρ = 21 .1 + u · σ / with respect to
any probability distribution over real unit vectors u . The result is a density matrix of the form
ρ = 21 .1 + a · σ /, where a is the centre of mass (a point in the unit ball) of the distribution of pure
states seen as a distribution over the unit sphere. The co-ordinates of a are called the Stokes
parameters when we are using this model to describe polarization of a photon, rather than spin
of an electron.
is called the quantum superposition of the original two states, with complex weights c1 and c2 . This
is a completely different way of combining two states from the mixture p1 |φ1 φ1 | + p2 |φ2 φ2 |.
(Sometimes the latter is called an ‘uncoherent mixture’ and the former a ‘coherent mixture’.)
For example, consider the case√ d = 2, let |φ1 and |φ2 form an orthornormal basis of H = C2
1
and suppose that c1 = c2 = 1= 2 and p1 = p2 = 2 . We consider the equal weights superposition
and the equal weights mixture of the pure states |φ1 and |φ2 , showing how some measurements
can distinguish between these states, whereas others do not.
The two matrices m.1/ = |φ1 φ1 |, m.2/ = |φ2 φ2 | define a measurement M φ with two possi-
ble outcomes 1 and 2, say. The probability distributions of the outcome under the superposition
and under the mixture just defined are identical (probabilities 21 for each of the outcomes 1
and 2). √ √
Define now a new orthonormal basis |ψ1 = .|φ1 + |φ2 /= 2 and |ψ1 = .|φ1 − |φ2 /= 2.
Corresponding to this basis, we can construct a measurement M ψ in the same way as before.
When the superposition is measured with M ψ , the outcome 1 is certain and the outcome 2 is
impossible. However, when the mixture is measured with M ψ , the two outcomes have equal
probability 21 .
It is very important to note that a pure state can be expressed as a superposition of others,
and a mixed state as a mixture of others, in many different ways.
ρt = exp.tH=i/ρ0 exp.−tH=i/,
where ρt denotes the state at time t, = 1:05 × 10−34 Js is Planck’s constant and H is a self-
adjoint operator on H. If ρ0 is a pure state then ρt is pure for all t and we can choose unit vectors
ψt such that ρt = |ψt ψt | and
ψt = exp.tH=i/ψ0 : .8/
We next discuss the notion of quantum randomization, whereby adding an auxiliary quantum
system to a system under study gives us further possibilities for probing the system of interest.
This also connects to the important notion of realization, i.e. representing a measurement by a
simple measurement on a quantum randomized extension.
Suppose that we have a Hilbert space H, and a pair .K, ρa /, where K is a Hilbert space and
ρa is a state on K. Any measurement M̃ on the product space H ⊗ K induces a measurement
M on H by the defining relationship
tr{ρ m.x/} = tr{.ρ ⊗ ρa / m̃.x/} for all states ρ on H, all outcomes x: .9/
The pair .K, ρa / is called an ancilla. The following theorem (Holevo’s extension of Naimark’s
theorem) states that any measurement M on H has the form (9) for some ancilla .K, ρa / and
some simple measurement M̃ on H ⊗ K. The triple .K, ρa , M̃/ is called a realization of M (the
words extension or dilation are also used sometimes). Adding an ancilla before taking a simple
measurement could be thought of as quantum randomization.
Theorem 1 (Holevo, 1982). For every measurement M on H, there is an ancilla .K, ρa / and a
simple measurement M̃ on H ⊗ K which form a realization of M.
We use the term ‘quantum randomization’ because of its analogy with the mathematical
representation of randomization in classical statistics, whereby we replace the original prob-
ability space with a product space, one of whose components is the original space of
interest, whereas the other corresponds to an independent random experiment with prob-
abilities under the control of the experimenter. Just as randomization in classical statistics
is sometimes needed to solve optimization problems of statistical decision theory, so quan-
tum randomization sometimes allows for strictly better solutions than can be obtained with-
out it.
Here is a simple spin half example of a non-simple measurement which cannot be represented
without quantum randomization.
for all B. When H is finite dimensional, as in this paper, every measurement is dominated: take
ν.A/ = tr{M.A/}. In the dominated case, the outcome of the measurement has a probabili-
ty distribution with density p.x; ρ, M/ = tr{ρ m.x/} with respect to ν. If the outcome space is
discrete and ν is counting measure, then these notations are linked to our original set-up by
m.x/ = M.{x}/, M.A/ = Σx∈A m.x/.
To exemplify these notions, suppose that for some dominated measurement M we can write
m.x/ = m1 .x/+m2 .x/ for two non-negative self-adjoint matrix-valued functions m1 and m2 . De-
fine M to be the measurement on the outcome space X = X×{1, 2} with density m.x, i/ = mi .x/,
.x, i/ ∈ X , with respect to the product of ν with counting measure. Then M is a refinement
of M.
We described earlier how we can form product spaces from separate quantum systems, lead-
ing to notions of product states, separable states and entangled states. Given a measurement
M on one component of a product space, we can naturally talk about ‘the same measurement’
on the product system. It has components M.A/ ⊗ 1. Given measurements M and M defined
on the two components of a product system, we can define in a natural way the measurement
‘apply M and M simultaneously to the first and second component respectively’: its outcome
space is the product of the two outcome spaces, and it is defined using obvious notation by
.M ⊗ M /.A × A / = M.A/ ⊗ M .A /.
A measurement M on a product space is called separable if it has a density m such that each
m.x/ can be written as a positive linear combination of tensor products of non-negative com-
ponents. It can then be thought of as a coarsening of a measurement with density m such that
each m .y/ is a product of non-negative components.
These formulae generalize naturally equations (4) and (5). In the physics literature, this kind of
representation is often called the Kraus representation of a completely positive instrument. For
brevity we do not explain these terms, in particular ‘complete positivity’, further. The interested
reader is referred to Davies and Lewis (1970), Davies (1976), Kraus (1983), Ozawa (1985),
Nielsen and Chuang (2000), Loubenets (2001) and Holevo (2001a).
When the posterior state is disregarded, the instrument N gives rise to the measurement M
with density
m.x/ = ni .x/Å ni .x/
i
with respect to the dominating measure ν. Clearly, a measurement M can be represented as the
‘data part’ of an instrument in very many different ways.
Further results of Ozawa (1985) generalize the realizability of measurements (Naimark and
Holevo theorems) to the realizability of an arbitrary completely positive instrument. Namely,
after forming a compound system by taking the tensor product with some ancilla, the instru-
ment can be realized as a unitary (Schrödinger) evolution for some length of time, followed by
the action of a simple instrument (measurement of an observable, with state transition accord-
ing to the Lüders–von Neumann projection postulate). Therefore to say that the most general
operation on a quantum system is a completely positive instrument comes down to saying that
the only mechanisms known in quantum mechanics are Schrödinger evolution, von Neumann
measurement and forming compound systems. Combining these ingredients in arbitrary ways,
we remain within the class of completely positive instruments; moreover, anything in that class
can be realized in this way.
An instrument defined on one component of a product system can be extended in a natural
way (similarly to that described in Section 2.6 for measurements) to an instrument on the product
system. Conversely, it is of great interest whether instruments on a product system can in some
way be reduced to ‘separate instruments on the separate subsystems’. There are two important
notions in this context. The first (similar to the concept of separability of measurements) is the
mathematical concept of separability of an instrument defined on a product system: this is that
each ni .x/ in the Kraus representation of an instrument is a tensor product of separate matri-
ces for each component. The second is the physical property which we shall call multilocality:
an instrument is called multilocal if it can be represented as a coarsening of a composition of
separate instruments applied sequentially to separate components of the product system, where
the choice of each instrument at each stage may depend on the outcomes of the instruments
applied previously. Moreover, each component of the system may be measured several times
(i.e. at different stages), and the choice of component measured at the nth stage may depend on
the outcomes at previous stages. One should think of the different components of the quantum
system as being localized at different locations in space. At each location separately, anything
quantum is allowed, but all communication between locations is classical. It is a theorem of
Quantum Statistical Inference 789
Bennett et al. (1999) that every multilocal instrument is separable, but that (surprisingly) not
all separable instruments are multilocal. It is an open problem to find a physically meaningful
characterization of separability, and conversely to find a mathematically convenient characteri-
zation of multilocality. (Note that our terminology is not standard: the word ‘unentangled’ is
used by some instead of ‘separable’, and ‘separable’ instead of ‘multilocal’.)
Not all joint measurements (by which we just mean instruments on product systems) are sep-
arable, let alone multilocal. Just as quantum randomized measurements can give strictly more
powerful ways to probe the state of a quantum system than (combinations of) simple measure-
ments and classical randomization, so non-separable measurements can do strictly better than
separable measurements at extracting information from product systems, even if a priori there
is no interaction of any kind between the subsystems.
and log-likelihood
For simplicity, let us suppose that θ is one dimensional. It is often useful to express the log-
likelihood derivative in terms of the symmetric logarithmic derivative or quantum score of ρ,
denoted by ρ==θ . This is defined implicitly as the self-adjoint solution of the equation
where we have used the fact that for any self-adjoint matrices P, Q and R and any matrix T the
trace operation satisfies tr.PQR/ = tr.RQP/ and tr.T/ = 21 tr.T + T Å /. It follows that
It plays a key role in the quantum context, just as in classical statistics, and is discussed in
Section 6. In particular, we shall discuss there its relationship with the expected or Fisher
quantum information
I.θ/ = tr{ρ.θ/ ρ==θ .θ/2 }: .15/
The quantum score is a self-adjoint operator and therefore may be interpreted as an observable
which we might measure on the quantum system. What we have just seen is that the outcome
of a measurement of the quantum score has mean 0 and variance equal to the quantum Fisher
information.
Three important special types of quantum exponential model are those in which T1 , . . . , Tk
are self-adjoint (and, for the first type, T0 , T1 , . . . , Tk all commute), and the quantum states have
the forms
ρ.θ/ = exp{−κ.θ/} exp.T0 + θr Tr /, .16/
ρ.θ/ = exp{−κ.θ/} exp. 21 θr Tr /ρ0 exp. 21 θr Tr /, .17/
ρ.θ/ = exp.− 21 iθr Tr /ρ0 exp. 21 iθr Tr / .18/
respectively, where θ = .θ1 , . . . , θk / ∈ Rk and ρ0 is non-negative (and self-adjoint), and the sum-
mation convention is in force.
We call these three types the quantum exponential models of mechanical type, symmetric
type and unitary type respectively. The mechanical type arises (at least, with k = 1) in quantum
Quantum Statistical Inference 791
statistical mechanics as a state of statistical equilibrium; see Gardiner and Zoller (2000), sec-
tion 2.4.2. The symmetric type has theoretical statistical significance, as we shall see, connected
among other things to the fact that the quantum score for this model is easy to compute explicitly.
The unitary type has physical significance connected to the fact that it is also a transformation
model (quantum transformation models are defined in Section 4.2). The mechanical type is a
special case of the symmetric type when T0 , T1 , . . . , Tk all commute.
In general, the statistical model that is obtained by applying a measurement to a quantum
exponential model is not an exponential model (in the classical sense). However, for a quantum
exponential model of the form (17) in which
Tj = tj .X/ j = 1, . . . , k, for some self-adjoint X, .19/
i.e. the Tj commute, the statistical model that is obtained by applying the measurement X is a
full exponential model. Various pleasant properties of such quantum exponential models then
follow from standard properties of the full exponential models.
The classical Cramér–Rao bound for the variance of an unbiased estimator t of θ is
var.t/ i.θ; M/−1 : .20/
Combining inequality (20) with Braunstein and Caves’s (1994) quantum information bound
i.θ; M/ I.θ/ (see inequality (27) in Section 6.2) yields Helstrom’s (1976) quantum Cramér–
Rao bound
var.t/ I.θ/−1 , .21/
whenever t is an unbiased estimator based on a quantum measurement. It is a classical result
that, under certain regularity conditions, the following are equivalent:
(a) equality holds in expression (20),
(b) the score is an affine function of t and
(c) the model is exponential with t as canonical statistic (see pages 254–255 of Cox and
Hinkley (1974)).
This result has a quantum analogue; see theorems 3 and 4 and corollary 1 below, which states that
under certain regularity conditions there is equivalence between
(a) equality holds in expression (21) for some unbiased estimator t based on some measure-
ment M,
(b) the symmetric quantum score is an affine function of commuting T1 , . . . , Tk and
(c) the quantum model is a quantum exponential model of type (17) where T1 , . . . , Tk satisfy
condition (19).
The regularity conditions which we assume below are indubitably too strong: the result should
be true under minimal smoothness assumptions.
where U is a fixed 2 × 2 unitary matrix and σx and σy are two of the Pauli spin matrices, and
the parameter θ varies through [0, 2π/; see Section 2.2. The matrix U can always be written as
exp.−iφ u · σ / for some real three-dimensional unit vector u
and angle φ. Considered as a curve
on the Poincaré sphere, the model forms a great circle. If U is the identity (or, equivalently,
φ = 0) the curve just follows the equator; the presence of U corresponds to rotating the sphere
carrying this curve about the direction u through an angle φ. Thus our model describes an
arbitrary great circle on the Poincaré sphere, parameterized in a natural way. Since we can write
where the unitary matrix Vθ corresponds to rotation of the Poincaré sphere by an angle θ about
the z-axis, we can write this model as a transformation model. The model is clearly also an
exponential model of unitary type. Perhaps surprisingly, it can be reparameterized so as also to
be an exponential model of symmetric type. We leave the details (which depend on the algebraic
properties of the Pauli spin matrices) to the reader, but we just point out that a one-parameter
pure state exponential model of symmetric type must be of the form
for some real unit vectors u and v, since every self-adjoint 2 × 2 matrix is an affine function of a
spin matrix u
· σ . Now write out the exponential of a matrix as its power series, and use the fact
that the square of any spin matrix is the identity. This example is due to Fujiwara and Nagaoka
(1995).
where π.dx|t; θ, N/ is the conditional distribution of x given T.x/ = t computed from π.dx; θ, N /.
Definition 4 (quantum sufficiency of instruments). Let N be a coarsening of an instrument
N by T : .X, A/ → .Y, B/. Then N is said to be quantum sufficient with respect to a family of
states .ρ.θ/ : θ ∈ Θ/ if
(a) the measurement M determined by tr{M .·/ρ} = π.·; ρ, N / is sufficient for the measure-
ment M determined by tr{M.·/ρ} = π.·; ρ, N/, with respect to the family .ρ.θ/ : θ ∈ Θ/,
and
(b) for any x ∈ X, the posterior families .σ.x; θ, N/ : θ ∈ Θ/ and .σ{T.x/; θ, N } : θ ∈ Θ/ are
inferentially equivalent.
Thus the likelihood function of the statistical model that is obtained by applying M to ρ
is equivalent to that obtained by applying M to σ, for the same outcome of each instru-
ment.
A consideration of the distribution of the likelihood ratio leads to the following definition.
Definition 8 (weak likelihood equivalence). Two parametric families of states ρ = .ρ.θ/ : θ ∈ Θ/
and σ = .σ.θ/ : θ ∈ Θ/ on Hilbert spaces H and K respectively are said to be weakly likeli-
hood equivalent if for every measurement M on H with outcome space X there is a measure-
ment M on K with some outcome space Y such that the likelihood ratios
tr{M.dx/ ρ.θ/} tr{M .dy/ σ.θ/}
and
tr{M.dx/ ρ.θ /} tr{M .dy/ σ.θ /}
have the same distribution for all θ and θ in Θ, and vice versa.
The precise connection between likelihood equivalence and inferential equivalence is not yet
known but the following conjecture appears reasonable.
Conjecture 1. Two quantum models are strongly likelihood equivalent if and only if they are
inferentially equivalent up to quantum randomization.
796 O. E. Barndorff-Nielsen, R. D. Gill and P. E. Jupp
6. Quantum and classical Fisher information
In Section 3 we showed how to express the Fisher information in the outcome of a measure-
ment in terms of the quantum score. In this section we discuss quantum analogues of Fisher
information and their relationship to the classical concepts.
which is a quantum analogue of the classical relationship i.θ/ = Eθ {j.θ/} between expected and
observed information (where j.θ/ = −l=θ=θ .θ/). Note that J.θ/ is an observable, just as j.θ/ is a
random variable.
Neither I.θ/ nor J.θ/ depends on the choice of measurement, whereas i.θ/ = i.θ; M/ does
depend on the measurement M. Expected quantum information behaves additively, i.e. for
parametric quantum models of states of the form ρ : θ → ρ1 .θ/ ⊗ . . . ⊗ ρn .θ/ (which model
‘independent particles’) the associated expected quantum information satisfies
n
Iρ1 ⊗:::⊗ρn .θ/ = Iρi .θ/,
i=1
When the state ρ is strictly positive, and under further non-degeneracy conditions, essentially
the only way to achieve the bound (27) is through measuring the quantum score. In the discussion
below we first keep the value of θ fixed. Since any non-negative self-adjoint matrix can be written
as a sum of rank 1 matrices (using its eigenvalue–eigenvector decomposition), it follows that any
dominated measurement can be refined to a measurement for which each m.x/ is of rank 1. Thus
m.x/ = r.x/ |ξ.x/ξ.x/|
798 O. E. Barndorff-Nielsen, R. D. Gill and P. E. Jupp
for some real r.x/ and state vector |ξ.x/; see the end of Section 2.6. If one measurement is
the refinement of another, then the distributions of the outcomes are related in the same way.
Therefore, under refinement of a measurement, Fisher expected information cannot decrease.
Therefore, if any measurement achieves equality in expression (27), there is also a measurement
with rank 1 components achieving the bound. Consider such a measurement. Suppose that ρ is
positive and that all the eigenvalues of ρ==θ are different. The condition
m.x/1=2 ρ==θ ρ1=2 = r.x/ m.x/1=2 ρ1=2
is then equivalent to
|ξ.x/ξ.x/|ρ==θ = r.x/ |ξ.x/ξ.x/|,
which states that ξ.x/ is an eigenvector of ρ==θ . Since we must have m.x/ ν.dx/ = 1, it follows
that all eigenvectors of ρ==θ occur in this way in components m.x/ of M. The measurement can
therefore be reduced or coarsened to a simple measurement of the quantum score, and the
reduction (at the level of the outcome) is sufficient.
Suppose now that the state ρ.θ/ is strictly positive for all θ, and that the quantum score has
distinct eigenvalues for at least one value of θ. Suppose that a single measurement exists attain-
ing equality in expression (27) uniformly in θ. Any refinement of this measurement therefore
also achieves the bound uniformly; in particular, the refinement to components which are all
proportional to projectors onto orthogonal one-dimensional eigenspaces of the quantum score
at the value of θ where the eigenvalues are distinct does so. Therefore the eigenvectors of the
quantum score at this value of θ are eigenvectors at all other values of θ. Therefore there is a
self-adjoint operator X with distinct eigenvalues such that ρ==θ .θ/ = f.X; θ/ for each θ. Fix θ0
and let θ
F.X; θ/ = f.X; θ/ dθ:
θ0
Let ρ0 = ρ.θ0 /. If we consider the defining equation (12) as a differential equation for ρ.θ/ given
the quantum score, and with initial condition ρ.θ0 / = ρ0 , we see that a solution is
ρ.θ/ = exp{ 21 F.X; θ/}ρ0 exp{ 21 F.X; θ/}:
Under smoothness conditions the solution is unique. Rewriting the form of this solution, we
come to the following theorem.
Theorem 2 (uniform attainability of quantum information bound). Suppose that the state
is everywhere positive, the quantum score has distinct eigenvalues for some value of θ and
is smooth. Suppose that a measurement M exists with i.θ; M/ = I.θ/ for all θ, thus attaining
the Braunstein–Caves information bound (27) uniformly in θ. Then there is an observable X
such that a simple measurement of X also achieves the bound uniformly, and the model is of
the form
ρ.θ/ = c.θ/ exp{ 21 F.X; θ/}ρ0 exp{ 21 F.X; θ/} .29/
for a function F , indexed by θ, of an observable X where c.θ/ = 1=tr[ρ0 exp{F.X; θ/}], ρ==θ .θ/ =
f.X; θ/−tr{ρ.θ/f.X; θ/} and f.X; θ/ = F=θ .X; θ/. Conversely, for a model of this form, a mea-
surement of X achieves the bound uniformly.
Remark 2 (spin half case). In the spin half case, if the information is positive then the quantum
score has distinct eigenvalues, since the outcome of a measurement of the quantum score always
equals one of the eigenvalues, has mean 0 and positive variance.
Quantum Statistical Inference 799
Theorem 3 (uniform attainability of quantum Cramér–Rao bound). Suppose that the posi-
tivity and non-degeneracy conditions of theorem 2 are satisfied, and suppose that for the
outcome of some measurement M there is a statistic t such that, for all θ, t is an unbiased
estimator of θ achieving Helstrom’s quantum Cramér–Rao bound (21), var.t/ = I.θ/−1 . Then
the model is a quantum exponential model of symmetric type (17),
ρ.θ/ = c.θ/ exp. 21 θT/ρ0 exp. 21 θT/,
for some observable T , and simple measurement of T is equivalent to the coarsening of M
by t.
Proof. The coarsening M = M ◦ t −1 by t of the measurement M also achieves the quantum
information bound (27) uniformly, i.e. i.θ; M / = I.θ/. Applying theorem 2 to this measurement,
we discover that the model is of the form (29), and (if necessary refining the measurement to
have rank 1 components) t can be considered as a function of the outcome of a measurement
of the observable X, and it achieves the classical Cramér–Rao bound for unbiased estimators
of θ based on this outcome. The density of the outcome (with respect to counting measure on
the eigenvalues of X) is found to be
c.θ/ exp{F.x; θ/} tr{ρ0 Π.x/},
where Π.x/ is the projector onto the eigenspace of X corresponding to eigenvalue x. Hence, up
to addition of functions of θ or x alone, F.x; θ/ is of the form θ t.x/.
The basic inequality (27) holds also when the dimension of θ is greater than 1. In that case,
the quantum information matrix I.θ/ is defined in equation (26) and the Fisher information
matrix i.θ; M/ is defined by
irs .θ; M/ = Eθ {lr .θ/ ls .θ/},
where lr denotes l=θr etc. Then inequality (27) holds in the sense that I.θ/ − i.θ; M/ is positive
semidefinite. The inequality is sharp in the sense that I.θ/ is the smallest matrix dominating all
i.θ; M/. However, it is typically not attainable, let alone uniformly attainable.
Theorem 2 can be generalized to the case of a vector parameter. This also leads to a general-
ization of theorem 3, which is the content of corollary 1 below.
Theorem 4. Let .ρ.θ/ : θ ∈ Θ/ be a twice-differentiable parametric quantum model. If
(a) there is a measurement M with i.θ; M/ = I.θ/ for all θ,
(b) ρ.θ/ is positive for all θ and
(c) Θ is simply connected
then, for any θ0 in Θ, there are an observable X and a function F (possibly depending on θ0 )
such that
ρ.θ/ = exp{ 21 F.X; θ/} ρ.θ0 / exp{ 21 F.X; θ/}:
Corollary 1. If, under the conditions of theorem 4, there is an unbiased estimator t of θ based
on the measurement M achieving condition (21), then the model is a quantum exponential
family of symmetric type (17) with commuting Tr .
Versions of these results have been known for some time; see Young (1975), Fujiwara and
Nagaoka (1995) and Amari and Nagaoka (2000); compare especially our corollary 1 with Amari
and Nagaoka (2000), theorem 7.6, and our theorem 4 with parts (I)–(IV) of the subsequent out-
lined proof in Amari and Nagaoka (2000). Unfortunately, precise regularity conditions and
800 O. E. Barndorff-Nielsen, R. D. Gill and P. E. Jupp
detailed proofs seem to be available elsewhere only in some earlier publications in Japanese.
We have obtained the same conclusions, by a different proof, in the spin half pure state case,
example 7. This indicates that a more general result is possible without the hypothesis of posi-
tivity of the state. See Matsumoto (2002) for important new work on the pure state case.
The symmetric logarithmic derivative is not the unique quantum analogue of the classical
statistical concept of score. Other analogues include the right, left and balanced derivatives
obtained from suitable variants of equation (12). Each of these gives a quantum information
inequality and a quantum Cramér–Rao bound analogous to inequalities (27) and (21). See
Belavkin (1976) and (as yet) unpublished new work by this author. There is no general relation-
ship between the various quantum information inequalities when the dimension of θ is greater
than 1.
Asymptotic optimality theory for quantum estimation has only just started to be developed;
see Gill and Massar (2000), Gill (2001b) and Keyl and Werner (2003); for an application see
Hannemann et al. (2002).
8.4. Conclusion
This paper has, in brief form, presented our current view of a role for statistical inference in
quantum physics. We are keenly aware that many relevant parts of quantum physics and quan-
tum stochastics have not been reviewed or have only been touched on.
Acknowledgements
We gratefully acknowledge Mathematische Forschungsinstitut Oberwolfach for support through
the ‘Research in pairs’ programme, and the European Science Foundation’s programme on
quantum information for supporting a working visit to the University of Pavia.
We have benefited from conversations with many colleagues. We are particularly grateful to
Elena Loubenets, Hans Maassen, Franz Merkl, Klaus Mølmer and Philip Stamp.
804 O. E. Barndorff-Nielsen, R. D. Gill and P. E. Jupp
References
Amari, S.-I. and Nagaoka, H. (2000) Methods of Information Geometry. Oxford: Oxford University Press.
Aspect, A., Dalibard, J. and Roger, G. (1982) Experimental test of Bell’s inequalities using time-varying analysers.
Phys. Rev. Lett., 49, 1804–1807.
Banaszek, K., D’Ariano, G. M., Paris, M. G. A. and Sacchi, M. F. (2000) Maximum-likelihood estimation of the
density matrix. Phys. Rev. A, 61, 010304(R).
Barndorff-Nielsen, O. E. and Cox, D. R. (1994) Inference and Asymptotics. London: Chapman and Hall.
Barndorff-Nielsen, O. E. and Gill, R. D. (2000) Fisher information in quantum statistics. J. Phys. A, 33, 4481–4490.
Barndorff-Nielsen, O. E., Gill, R. D. and Jupp, P. E. (2001a) Quantum information. In Mathematics Unlimited—
2001 and Beyond, part I (eds B. E. Engquist and W. Schmid), pp. 83–107. Heidelberg: Springer.
Barndorff-Nielsen, O. E., Gill, R. D. and Jupp, P. E. (2001b) On quantum statistical inference. Research Report
2001-19. Centre for Mathematical Physics and Stochastics, University of Aarhus, Aarhus. (Available from
http://arxiv.org/abs/quant-ph/0307189.)
Barndorff-Nielsen, O. E., Gill, R. D. and Jupp, P. E. (2003) Quantum statistics. To be published.
Barndorff-Nielsen, O. E. and Koudou, A. E. (1995) Cuts in natural exponential families. Teor. Veroy. Primen., 2,
361–372.
Barndorff-Nielsen, O. E. and Loubenets, E. R. (2002) General framework for the behaviour of continuously
observed quantum systems. J. Phys. A, 35, 565–588.
Barndorff-Nielsen, O. E. and Thorbjørnsen, S. (2002a) Selfdecomposability and Lévy processes in free probability.
Bernoulli, 8, 323–366.
Barndorff-Nielsen, O. E. and Thorbjørnsen, S. (2002b) Lévy laws in free probability. Proc. Natn. Acad. Sci. USA,
99, 16568–16575.
Barndorff-Nielsen, O. E. and Thorbjørnsen, S. (2002c) Lévy processes in free probability. Proc. Natn. Acad. Sci.
USA, 99, 16576–16580.
Belavkin, V. P. (1976) Generalized Heisenberg uncertainty relations, and efficient measurements in quantum
systems. Theor. Math. Phys., 26, 213–222.
Belavkin, V. P. (2002) Quantum causality, stochastics, trajectories and information. Rep. Prog. Phys., 65, 353–420.
Bell, J. S. (1964) On the Einstein Podolsky Rosen paradox. Physics, 1, 195–200.
Bennett, C. H., DiVincenzo, D. P., Fuchs, C. A., Mor, T., Rains, E., Shor, P. W., Smolin, J. A. and Wootters,
W. K. (1999) Quantum nonlocality without entanglement. Phys. Rev. A, 59, 1070–1091.
Biane, P. (1995) Calcul stochastique non-commutatif. Lect. Notes Math., 1608, 1–96.
Biane, P. and Speicher, R. (2001) Free diffusions, free entropy and free Fisher information. Ann. Inst. H. Poincaré
Probab. Statist., 37, 581–606.
Brandt, S. and Dahmen, H. D. (1995) The Picture Book of Quantum Mechanics. Heidelberg: Springer.
Braunstein, S. L. and Caves, C. M. (1994) Statistical distance and the geometry of quantum states. Phys. Rev.
Lett., 72, 3439–3443.
Christensen, B. J. and Kiefer, N. M. (1994) Local cuts and separate inference. Scand. J. Statist., 21, 389–401.
Christensen, B. J. and Kiefer, N. M. (2000) Panel data, local cuts and orthogeodesic models. Bernoulli, 6, 667–678.
Cox, D. R. and Hinkley, D. V. (1974) Theoretical Statistics. London: Chapman and Hall.
D’Ariano, G. M. (1997a) Quantum estimation theory and optical detection. In Quantum Optics and the Spec-
troscopy of Solids (eds T. Hakioğlu and A. S. Shumovsky), pp. 135–174. Amsterdam: Kluwer.
D’Ariano, G. M. (1997b) Measuring quantum states. In Quantum Optics and the Spectroscopy of Solids (eds
T. Hakioğlu and A. S. Shumovsky), pp. 175–202. Amsterdam: Kluwer.
D’Ariano, G. M. (2001) Tomographic methods for universal estimation in quantum optics. In Proc. Scuola E.
Fermi Experimental Quantum Computation and Information, Varenna. Bologna: Editrice Compositori.
D’Ariano, G. M. and Lo Presti, P. (2001) Quantum tomography for measuring experimentally the matrix elements
of an arbitrary quantum operation. Phys. Rev. Lett., 86, 4195–4198.
Davies, E. B. (1976) Quantum Theory of Open Systems. London: Academic Press.
Davies, E. B. and Lewis, J. T. (1970) An operational approach to quantum probability. Communs Math. Phys.,
17, 239–260.
Feynman, R. P. (1951) The concept of probability in quantum mechanics. In Proc. 2nd Berkeley Symp. Mathe-
matical Statistics and Probability, pp. 533–541. Berkeley: University of California Press.
Franz, U., Léandre, R. and Schott, R. (2001) Malliavan calculus and Skorohod integration for quantum stochastic
processes. Inf. Dim. Anal. Quant. Probab. Reltd Top., 4, 11–38.
Fujiwara, A. and Nagaoka, H. (1995) Quantum Fisher metric and estimation for pure state models. Phys. Lett.
A, 201, 119–124.
Gardiner, C. W. and Zoller, P. (2000) Quantum Noise, 2nd edn. Berlin: Springer.
Gill, R. D. (2001a) Teleportation into quantum statistics. J. Kor. Statist. Soc., 30, 291–325.
Gill, R. D. (2001b) Asymptotics in quantum statistics. In State of the Art in Probability and Statistics, Festschrift
for W. R. van Zwet (eds M. C. M. de Gunst, C. A. J. Klaassen and A. W. van der Vaart), pp. 255–285. Hayward:
Institute of Mathematical Statistics.
Gill, R. D. (2003) Accardi contra Bell (cum mundi): the impossible coupling. In Mathematical Statistics and
Quantum Statistical Inference 805
Applications: Festschrift for Constance van Eeden (eds M. Moore, S. Froda and C. Léger). Beachwood: Institute
of Mathematical Statistics.
Gill, R. D. and Massar, S. (2000) State estimation for large ensembles. Phys. Rev. A, 61, 2312–2327.
Gilmore, R. (1994) Alice in Quantum Land. Wilmslow: Sigma.
Hannemann, T., Reiss, D., Balzer, C., Neuhauser, W., Toschek, P. E. and Wunderlich, C. (2002) Self-learning
estimation of quantum states. Phys. Rev. A, 65, 050303-1–050303-4.
Helstrom, C. W. (1976) Quantum Detection and Information Theory. New York: Academic Press.
Holevo, A. S. (1982) Probabilistic and Statistical Aspects of Quantum Theory. Amsterdam: North-Holland.
Holevo, A. S. (2001a) Statistical structure of quantum theory. Lect. Notes Phys. Monogr., 67.
Holevo, A. S. (2001b) Lévy processes and continuous quantum measurements. In Lévy Processes—Theory and
Applications (eds O. E. Barndorff-Nielsen, T. Mikosch and S. Resnick), pp. 225–239. Boston: Birkhäuser.
Isham, C. (1995) Quantum Theory. Singapore: World Scientific.
Keyl, M. and Werner, R. F. (2003) Estimating the spectrum of a density operator. Phys. Rev. A, 64, 052311-1–
052311-5.
Kraus, K. (1983) States, effects and operations: fundamental notions of quantum theory. Lect. Notes Phys., 190.
Leonhardt, U. (1997) Measuring the Quantum State of Light. Cambridge: Cambridge University Press.
Loubenets, E. R. (2001) Quantum stochastic approach to the description of quantum measurements. J. Phys. A,
34, 7639–7675.
Malley, J. D. and Hornstein, J. (1993) Quantum statistical inference. Statist. Sci., 8, 433–457.
Matsumoto, K. (2002) A new approach to the Cramér-Rao-type bound of the pure state model. J. Phys. A, 35,
3111–3123.
Meyer, P.-A. (1993) Quantum probability for probabilists. Lect. Notes Math., 1538.
Mølmer, K. and Castin, Y. (1996) Monte Carlo wavefunctions. Coher. Quant. Opt., 7, 193–202.
Mooij, J. E., Orlando, T. P., Levitov, L., Tian, L., van der Wal, C. H. and Lloyd, S. (1999) Josephson persistent-
current qubit. Science, 285, 1036–1039.
Nielsen, M. A. and Chuang, I. L. (2000) Quantum Computation and Quantum Information. New York: Cambridge
University Press.
Ozawa, M. (1985) Conditional probability and a posteriori states in quantum mechanics. Publ. RIMS Kyoto
Univ., 21, 279–295.
Percival, I. (1998) Quantum State Diffusion. Cambridge: Cambridge University Press.
Peres, A. (1995) Quantum Theory: Concepts and Methods. Dordrecht: Kluwer.
Weihs, G., Jennewein, T., Simon, C., Weinfurter, H. and Zeilinger, A. (1998) Violation of Bell’s inequality under
strict Einstein locality conditions. Phys. Rev. Lett., 81, 5039–5043.
Williams, D. (2001) Weighing the Odds. Cambridge: Cambridge University Press.
Wiseman, H. M. (1999) Adaptive quantum measurements (summary). In Miniproc. Wrkshp Stochastics and Quan-
tum Physics, miscellanea, 14. Aarhus: University of Aarhus.
Young, T. Y. (1975) Asymptotically efficient approaches to quantum-mechanical parameter estimation. Inform.
Sci., 9, 25–42.
ξθ
S
Statistical model M
Fig. 1. Square-root density functions correspond to points in the positive orthant of the unit sphere S in
Hilbert space H: for a parametric density function, the state lies on a submanifold M of S with metric given
by the Fisher information matrix
This might convey the impression, to readers who are unfamiliar with quantum mechanics, that the essence
of quantum inference is simply the replacement of the probability density function by a density matrix.
Hence, I shall, instead, focus on the nature of pure states.
Probability theory typically involves a density function p.x|θ/ which may depend on a set of parameters
{θ}. In statistics, we often consider the log-likelihood function lθ .x/ √ = ln{p.x|θ/}. However, in physics
we naturally work with the square-root density function ξθ .x/ = p.x|θ/, which permits a formulation
of the problem in a real Hilbert space context. This was recognized by Mahalanobis (1930, 1936) and
Bhattacharyya (1943, 1946). In particular, if the state ξθ .x/ depends on a set of parameters, then the sta-
tistical model is represented by a finite dimensional submanifold M of the unit sphere S ⊂ H, where the
Fisher information determines the Riemannian metric on M (Rao, 1945, 1947). See Fig. 1. This leads to
the notion of information geometry (see Burbea (1986)).
Note, however, that √ to recover the probability we must square the state ξ, indicating that we could
have defined ξ = − p as the negative root of p. Thus, the relevant state space consists of equivalence
classes, i.e. rays through the origin of H. Hence, all aspects of classical probability can be characterized
by geometric features of real projective spaces (Brody and Hughston, 1999). This idea can be extended by
allowing the state ξ to be a complex-valued function, the probability being defined by p = |ξ|2 . Such a
complex-valued function is what physicists call a wave function or a pure state.
However, complex-valued states are considered in quantum mechanics not merely because this remains
consistent with probability laws but rather because they are indispensable for the study of quantum sys-
tems. The role of complex numbers in quantum mechanics is twofold: first, a complex number rotates a
state by the angle 21 π. Second, it determines the ‘direction of time’ via Schrödinger’s equation. The latter
point implies that quantum mechanics is not a static but a dynamic theory. These two features are relevant
to the estimation problem that is discussed below.
Now, if the quantum state depends on a set of unknown parameters, then the various problems of
statistical inference can be cast in a Hilbert space setting à la Efron. The inference problem for pure states
has been extensively studied in the literature (see, for example, Burbea and Rao (1984), Caianiello (1983),
Caianiello and Guz (1988) and Brody and Hughston (1996, 1997, 1998); see also Yuen et al. (1975),
Helstrom (1982) and Brody and Meister (1996) for further work on quantum hypothesis testing that
extends earlier results by Helstrom and Holevo). Citation of these results would make the paper more
informative to statisticians who are familiar with information geometry and asymptotic inference or stand-
ard hypothesis testing.
To illustrate the subtlety of pure state estimation, consider the inference problem in Section 6 (example 7).
Given a pair of spin half eigenstates |↑ and |↓, a generic pure state is expressible as |ξ.θ, φ/ = cos. 21 θ/|↑ +
sin. 21 θ/ exp.iφ/|↓. The value of θ is assumed fixed, and we seek to estimate the unknown parameter φ.
However, unlike the problems in classical estimation theory, where the state remains static, the complex
square-root density function |ξ evolves in time (the second role of the complex number). More precisely,
only an energy eigenstate can be static in quantum physics.
In this example, unitary evolution consists in rigid rotation of the state space around the axis defined
by the spin eigenstates (Fig. 2). Therefore, the value of θ will remain fixed, whereas the value of φ, to be
estimated, varies according to φ = ωt .mod.2π//, where ω is the difference in energy of the two eigenstates.
808 Discussion on the Paper by Barndorff-Nielsen, Gill and Jupp
-up
>
spin
uncertainty relation:
ξ
h
θ ∆T -> 2 ∆H
φ
φ = ωτ [2π]
spi Schrodinger
n-d
ow Dynamics
n >
Fig. 2. Spin half state space: a generic state is expressible as a superposition of the spin-up and spin-down
eigenstates; Schrödinger evolution corresponds to the rigid rotation of the sphere around the poles determined
by the eigenstates; the ‘speed’ of the Schrödinger trajectory is determined by the Anandan–Aharanov relation;
owing to the energy–time relation, the lower bound on time precision precludes the ‘preparation’ of a state
with a definite relative phase φ
exponential
H family
exp(φH)
quantum
trajectories
exp(iφH)
Fig. 3. One-parameter family of exponential curves and one-parameter families of unitary trajectories: the
unitary trajectories are everywhere orthogonal to the corresponding exponential family; consequently, the
Cramér–Rao lower bound is never attained for the estimation of the unitary trajectory
In this situation, the proper formulation of an estimation problem appears obscure. This can be reme-
died by prescribing that the time parameter t is the object of estimation, but then the results concerning
quantum exponential families and the attainability of the Cramér–Rao bound no longer apply.
To see this, apply the formulation of Efron (1975) to the square-root density function rather than to the
log-likelihood function. Then, as illustrated in Fig. 3 (see Brody and Hughston (1996) for the proof), if
the exponential family admits the form exp.φH/ where H is a random variable, the trajectories of the one-
parameter family of unitary transformations exp.iφH/ are everywhere orthogonal to the corresponding
exponential family (first role of the complex number). Therefore, although the quantum exponential fam-
ilies that were investigated by the authors presumably are relevant in some physical contexts, the authors
have yet to demonstrate this in specific examples.
As for mixed states, these are just (classical) probability distributions ρ.ξ/ over the space of complex
square-root likelihood
functions. The density matrix ρ̂ is defined by the first quadratic moment of the
distribution: ρ̂ = |ξξ|ρ.ξ/ dξ, where the integration extends over the pure state space. For standard
linear observables, the information contained in ρ̂ suffices to determine all statistical properties of the
system. Thus, an alternative approach to quantum inference would be to take the classical results that
are readily applicable to ρ.ξ/, and then to project out the component of the first moment to restore the
corresponding quantum results. An apparent paradox arises in that, if ρ.ξ/ is a classical exponential family,
then the corresponding density matrix ρ̂ is not an exponential family as defined in the paper. I hope that
the authors will further investigate this issue in their forthcoming book on the subject.
N. K. Majumdar (London)
I would like to quote Karl Popper’s statement that quantum theory is a statistical subject, which supports
the paper. I also support the idea that quantum theory does not necessarily need a new definition of
probability which is different from what the statistician is used to. All quantities of interest such as the
probability density represented by |ψ|2 or even by the probability current density can be fully understood
by classical probability.
My next remark is concerned with the fact that the wave function formulation of quantum mechanics
812 Discussion on the Paper by Barndorff-Nielsen, Gill and Jupp
is more familiar to many. The wave function not only gives the position probability |ψ|2 but can also be
expanded into a probability distribution of eigenfunctions for each observable. So, we can find a probability
distribution for each observable. A statistical study may conceivably be made of how the wave function
combines, the probabilistic distribution of eigenvalues and the position probability densities as well as the
deterministic evolution of ψ in time.
We are grateful to the discussants for their interesting and stimulating contributions. We find little to
disagree with, but much worth commenting on.
Road-maps
An issue that is raised by several discussants (in particular, Professor Accardi and Dr Frey) is where our
subject should be located on the map of various disciplines.
The first step is to translate key notions. As Dr Frey remarks, there are already problems with the title.
‘Quantum statistics’ could be the Bose–Einstein and Fermi–Dirac distributions of quantum ensembles,
as opposed to classical multinomial (balls in boxes). Our focus is statistical inference when the stochas-
tic contribution from quantum mechanics is important: a finite number of measurements on individual
systems.
We are not finished after translating key terms and locating our problem area on an existing map. A sci-
entific field has not just a theoretical framework but also motivations and aims. Professor Accardi rightly
locates quantum statistical inference as the study of certain completely positive maps on certain abstract
probability-like spaces. From his point of view, we are studying a special case of systems of the kind one
studies in much more generality in quantum probability.
However, one mathematical framework being formally contained in another does not mean that one
poses, in the smaller field, the same questions as in the larger. Typically, interesting questions for the
‘smaller’ field are meaningless in the larger. Consequently, the tools and results of the larger field, though
possibly relevant, are not prerequisites for entering the smaller field, though it is useful to know that they
are there (cf. probability as part of measure theory).
We appreciate, in this light, Dr Larsson’s reaction to our suggestion that quantum probability is a special
case of classical statistics. Our subject is multifaceted and one pair of spectacles does not suffice.
Eventually, quantum statistical inference might change the maps of adjacent and enveloping disciplines,
as well as being fed by them. We hope that some of the experts from quantum probability would consider
quantum statistical inference; they can contribute. Professor Accardi’s map may help them to take their
bearings.
Missing topics
Many topics had to be omitted from our paper: it is not a complete survey, but the outline of a kernel. As
Professor Bingham and Professor Accardi emphasize, we have left out the rich connections to quantum
probability and quantum stochastic processes, free probability, hypothesis testing (Professor Belavkin),
the relationships between quantum statistics and quantum information theory, Bayesian approaches, non-
parametric models and foundations (Professor Helland). Some of these topics figure in Barndorff-Nielsen
et al. (2001) and they will reappear in our forthcoming monograph. Above all, we keenly miss data examples
(Dr Frey). They will come.
Regarding nonparametric quantum statistics we would like to mention Gill and Guţă (2003).
Quantum probability is important for quantum physics, and deeply connected to classical probability.
Quantum stochastic analysis has applications in the continuous time evolution of quantum systems, quan-
tum field theory, classical stochastic processes and quantum statistical inference (continuous time data).
Professor Belavkin cites early papers which explore further the connections between exponential families
and information bounds. Much more besides can be found in the work of the 1970s and 1980s and needs
to be re-evaluated in the light of modern statistics.
Professor Kent and Dr Brody elaborate on the geometric content of quantum statistical models. We
should mention that the Japanese school (Hayashi, Fujiwara, Matsumoto and Nagaoka) is making rapid
progress in exploiting quantum statistical geometry. The connection to shape analysis is exciting.
One of the fascinations of quantum statistics is the foundational turmoil. As Professor Helland points
out, it is difficult to accept a theory of the world which posits an abstract mathematical structure ‘in the
814 Discussion on the Paper by Barndorff-Nielsen, Gill and Jupp
background’, laying down axiomatically its impact on experience. His point of departure is the structure of
statistical information in potential experiments. The Hilbert space structure emerges from intuitive invari-
ance properties (geometry again). Others are working on similar lines, but physicists do not know much
statistics, whereas we do not know much physics, which means that there is a real chance that connecting
these fields could lead to significant progress.
We mention Belavkin’s (2002) recent approach to the ‘measurement problem’ which considers both
measurement and unitary evolution as a special case of a single, intrinsically stochastic, continuous time,
evolution of a physical system. Professor Belavkin, like Professor Helland, adheres to Bohr’s standpoint,
that there is nothing (but paradoxes) to be gained from trying to find a classical reality behind quantum
randomness. This is echoed in Dr Thompson’s remark that all data (all experience) are classical; quan-
tum statistical inference is concerned with the surprising and beautiful structure behind, connecting the
probabilities.
Bell experiments
Much more could have been written about Bell inequalities. We reintroduce the topic here by describing a
little simulation experiment to generate ±1-valued variables X and Y , whose distribution depends on two
parameters a and b. Specifically, Pr.X = x, Y = y/ = p.x, y; a, b/
= .1 − xya · b/=4,
where x, y = ±1, and
a and b are unit vectors in R3 . This is the distribution of measurement of spins of two spin half particles
in the Bell singlet state.
The problem is to come up with a random variable Z and transformations such that (up to an arbitrarily
good approximation)
from Z and a one constructs X and S,
from Z and b one constructs Y and T ,
.X, Y/, conditional on S = 0 and T = 0, is distributed according to p:
A more difficult problem is to find an exact construction with certain success, i.e. without conditioning
(rejection sampling, post-selection). Bell (1964) tells us that it is impossible. The easier problem can be
solved in many ways; see for instance Larsson (1998) and Accardi et al. (2002) (‘the chameleon effect’).
Bell (1981) knew that experimental post-selection must be prohibited; otherwise the experiment cannot
rule out a classical explanation like our simulation.
As Dr Minozzo and Dr Larsson mention, no Bell experiment has yet been done which does not have an
alternative classical explanation, e.g. our rejection sampling story. For different experiments, the ‘expla-
nation’ must be rather different (compare Minozzo’s story with ours). Often it is rather artificial.
The physicists do their best, and the Weihs et al. (1998) version of Aspect’s experiment seems free of
both defects described by Dr Minozzo, though it still involves post-selection on coincidences. Most phys-
icists believe that a loophole-free experiment can and will be done, maybe even within a couple of years.
However, some consider it a real possibility that quantum mechanics itself will always force loopholes.
We would like to know for what distributions p can we do the simulation exactly, with probability of
success bounded away from 0, uniformly in the parameters? Does it help when, as in quantum mechanics,
we have no action at a distance, i.e. the marginal of X does not depend on b and the marginal of Y does
not depend on a?
Bell experiments form a rich field for classical statistics. Very recently, van Dam et al. (2003) have used
methodology from missing data theory to compare strengths of different Bell-type experiments.
Miscellany
Our decision to concentrate on mixed states was largely pedagogical. (This is in answer to Dr Brody and
Dr Majumdar.) They are central in modern quantum information theory and the reader must become used
to them. Mixed states are mixtures of pure states, so the pure case is included. We agree with Dr Brody
that there can be physical problems when considering our toy models in their most immediate context.
However, quantum statistical models for spin half apply also to the polarization of photons, as well as to
many other physical systems. Ballester (2003) has new results on the quantum tomography of operations
which involves a pure state model that is only marginally more complex than those which we mentioned
as toys, and the experiment is being done right now in a quantum optics laboratory.
Professor Kent asks whether our models are empirical or (physics) theoretical. Well, some do appear
in theory in physics, but a main motivation is just as in classical statistics: models which allow elegant
inference are studied in their own right as paradigms of statistical–theoretical modelling.
Dr Thompson refers to coherence, and Dr Frey to decoherence. Dr Thompson is referring to loss
functions. A characteristic of quantum statistical inference is that, the more we can learn about one par-
Discussion on the Paper by Barndorff-Nielsen, Gill and Jupp 815
ameter, the less we can learn about another. This means that a consideration of loss functions is necessary
at the design stage. As Dr Thompson emphasizes, we need to think carefully what the meaning of the
parameters is.
Decoherence (Dr Frey) refers to the decay of quantum entanglement through interaction with an envi-
ronment. This process is part of quantum physics and part of actual laboratory measurement. Deciding
at which point the quantum modelling can be replaced by classical (often forced by complexity as systems
grow increasingly large) is one of the most difficult jobs of real quantum physics. It is part of the modelling
of ‘what measurement is actually being performed’ on the system of interest.
Professor Bingham’s nice comments on Moyal point both to the past and to the future. Concerning the
connections between statistics and physics, we mention Edwards (2002) on the enormous significance of
Fisher’s background in physics for his work in statistics and genetics.
Many physicists have only hazy ideas of what statistics (in our sense) can mean for them. Ernest Ruther-
ford once said ‘if you need statistics, you did the wrong experiment’. However, he was not always as right
as he was influential. His prediction concerning the possibility (‘inconceivable’) of using the structure of
the atomic nucleus as a practically useful source of energy is a case in point: what about the use of statistics
in exploring the quantum nature of the world as a source of computational power?