Professional Documents
Culture Documents
Wojciech H. Zurek - Complexity, Entropy and The Physics of Information-Westview Press (1990)
Wojciech H. Zurek - Complexity, Entropy and The Physics of Information-Westview Press (1990)
Volume VIII
Santa Fe Institute
Studies in the Sciences of Complexity
CRC Press
CRS
CP' Taylor & Francis Group
Boca Raton London New York
PROCEEDINGS VOLUMES
LECTURES VOLUMES
It is, however, difficult to deny that the process of information gain can be
directly tied to the ability to extract useful work. Thus, questions concerning ther-
modynamics, the second law, and the arrow of time have become intertwined with
a half-century-old puzzle, that of the problem of measurements in quantum physics.
n Quantum measurements are usually analyzed in abstract terms of wave func-
tions and hamiltonians. Only very few discussions of the measurement problem
in quantum theory make an explicit effort to consider the crucial issue—the
transfer of information. Yet obtaining knowledge is the very reason for mak-
ing a measurement. Formulating quantum measurements and, more generally,
quantum phenomena in terms of information should throw a new light on the
problem of measurement, which has become difficult to ignore in light of new
experiments on quantum behavior in macroscopic systems.
The distinction between what is and what is known to be, so clear in classi-
cal physics, is blurred, and perhaps does not exist at all on a quantum level. For
instance, energetically insignificant interactions of an object with its quantum en-
vironment suffice to destroy its quantum nature. It is as if the "watchful eye" of
the environment "monitoring" the state of the quantum system forced it to behave
in an effectively classical manner. Yet, even phenomena involving gravity, which
happen on the most macroscopic of all the scales, bear the imprint of quantum
mechanics.
In fact it was recently suggested that the whole Universe—including configura-
tions of its gravitational field—may and should be described by means of quantum
theory. Interpreting results of the calculations performed on such a "Wavefunction
of the Universe" is difficult, as the rules of thumb usually involved in discussions
of experiments on atoms, photons, and electrons assume that the "measuring ap-
paratus" as well as "the observer" are much larger than the quantum system. This
is clearly not the case when the quantum system is the whole Universe. Moreover,
the transition from quantum to classical in the early epochs of the existence of the
Universe is likely to have influenced its present appearance.
n Black hole thermodynamics has established a deep and still largely mysteri-
ous connection between general relativity, quantum, and statistical mechanics.
Related questions about the information capacity of physical systems, funda-
mental limits on the capacity of communication channels, the origin of entropy
in the Universe, etc., are a subject of much recent research.
The three subjects above lie largely in the domain of physics. The following is-
sues forge connections between the natural sciences and the science of computation,
or, rather, the subject of information processing regarded in the broadest sense of
the word.
n Physics of computation explores limitations imposed by the laws of physics
on the processing of information. It is now established that both classical and
quantum systems can be used to perform computations reversibly. That is,
computation can be "undone" by running the computer backwards. It appears
Foreword iX
01010101010101...
PROCEEDINGS
This book has emerged from a meeting held during the week of May 29 to June
2, 1989, at St. John's College in Santa Fe under the auspices of the Santa Fe
Institute. The (approximately 40) official participants as well as equally numerous
"groupies" were enticed to Santa Fe by the above "manifesto." The book—like
the "Complexity, Entropy and the Physics of Information" meeting—explores not
only the connections between quantum and classical physics, information and its
transfer, computation, and their significance for the formulation of physical theories,
but it also considers the origins and evolution of the information-processing entities,
their complexity, and the manner in which they analyze their perceptions to form
models of the Universe. As a result, the contributions can be divided into distinct
sections only with some difficulty.
Indeed, I regard this degree of overlapping as a measure of the success of the
meeting. It signifies consensus about the important questions and on the antic-
ipated answers: they presumably lie somewhere in the "border territory," where
information, physics, complexity, quantum, and computation all meet.
ACKNOWLEDGMENTS
I would like to thank the staff of the Santa Fe Institute for excellent (and friendly)
organizational support. In particular, Ginger Richardson was principally responsible
for letting "the order emerge out of chaos" during the meeting. And somehow Ronda
Butler-Villa managed the same feat with this volume.
I would like to gratefully acknowledge the Santa Fe Institute, the Air Force
Office for Scientific Research, and the Center for Nonlinear Studies, Los Alamos
National Laboratory, for the financial (and moral) support which made this meeting
possible.
—Wojciech H. Zurek
Los Alamos National Laboratry
and the Santa Fe Institute
Contents
Foreword
Wojciech H. Zurek Ai
I Physics of information 1
Information, Physics, Quantum: The Search for Links
John Archibald Wheeler 3
Complexity of Models
J. Rissanen 117
Valuable Information
Seth Lloyd 193
Non-Equilibrium Polymers, Entropy, and Algorithmic
Information
Dilip K. Kondepudi 199
Indices
513
I Physics of Information
John Archibald Wheeler
Physics Departments, Princeton University, Princeton, NJ 08544 and University of Texas
at Austin, TX 78712
This report reviews what quantum physics and infOrmation theory have
to tell us about the age-old question, "How come existence?" No escape is
evident from four conclusions: (1) The world cannot be a giant machine,
ruled by any pre-established continuum physical law. (2) There is no such
thing at the microscopic level as space or time or spacetime continuum.
(3) The familiar probability function or functional, and wave equation
or functional wave equation, of standard quantum theory provide mere
continuum idealizations and by reason of this circumstance conceal the
information-theoretic source from which they derive. (4) No element in the
description of physics shows itself as closer to primordial than the elemen-
tary quantum phenomenon, that is, the elementary device-intermediated
act of posing a yes-no physical question and eliciting an answer or, in brief,
the elementary act of observer-participancy. Otherwise stated, every phys-
ical quantity, every it, derives its ultimate significance from bits, binary
yes-or-no indications, a conclusion which we epitomize in the phrase, it
from bit.
121The appendix of Kepler's Book 5 contains one side, the publications of the English physician
and thinker Robert Fludd (1574-1637) the other side, of a great debate, analyzed by Wolfgang
Pauli.85 Totally in contrast to Fludd's concept of intervention from on high63 was Kepler's guid-
ing principle, Ubi materia, ibi geometria—where there is matter, there is geometry. It was not
directly from Kepler's writings, however, that Newton learned of Kepler's three great geometry-
driven findings about the motions of the planets in space and in time, but from the distillation of
Kepler offered by Thomas Streete.166
JGST157 offers a brief and accessible summary of Einstein's 1915 and still standard geometro-
dynamics which capitalizes on Elie Cartan's appreciation of the central idea of the theory: the
boundary of a boundary is zero.
[31See Bohr.17 The mathematics of complernentarity I have not been able to discover stated any-
where more sharply, more generally and earlier than in H. Wey1,121 in the statement that the
totality of operators for all the physical quantities of the system in question form an irreducible
set.
Information, Physics, Quantum: The Search for Links 5
The shift in interference fringes between field off and field on reveals the magnitude
of the flux,
This impulse is the source of the force that displaces the indicator needle of the
magnetometer and gives us an instrument reading. We deal with bits wholesale
rather than bits retail when we run the fiducial current through the magnetometer
coil, but the definition of fields founds itself no less decisively on bits.
As a third and final example of it from bit, we recall the wonderful quantum
finding of Bekenstein,9,1°,11—a totally unexpected denouement of the earlier clas-
sical work of Penrose," Christodoulou,26 and Ruffini27—refined by Hawking,52,53
that the surface area of the horizon of a blackhole, rotating or not, measures the
entropy of the blackhole. Thus this surface area, partitioned in the imagination
(Figure 1) into domains, each of the size 4h loge 2, that is, 2.77... times the Planck
area, yields the Bekenstein number, N; and the Bekenstein number, so Thorne and
Zurek explain,173 tells us the number of binary digits, the number of bits, that
would be required to specify in all detail the configuration of the constituents out
of which the blackhole was put together. Entropy is a measure of lost information.
To no community of newborn outside observers can the blackhole be made to reveal
Information, Physics, Quantum: The Search for Links 7
out of which particular one of the 2N configurations it was put together. Its size,
an it, is fixed by the number, N, of bits of information hidden within it.
The quantum, h, in whatever current physics formula it appears, thus serves as
a lamp. It lets us see the horizon area as information lost, understand wave number
of light as photon momentum, and think of field flux as bit-registered fringe shift.
Giving us its as bits, the quantum presents us with physics as information.
How come a value for the quantum so small as h = 2.612 x 10"66cm2? As
well ask why the speed of light is so great as c = 3 x 101°cm/s! No such constant
as the speed of light ever makes an appearance in a truly fundamental account
3. FOUR NO'S
To the question "How come the quantum?" we thus answer, "Because what we call
existence is an information-theoretic entity." But how come existence? Its as bits,
yes; and physics as information, yes; but whose information? How does the vision
of one world arise out of the information-gathering activities of many observer-
participants? In the consideration of these issues we adopt for guidelines four no's.
FIRST NO
"No tower of turtles," advised William James. Existence is not a globe supported
by an elephant, supported by a turtle, supported by yet another turtle, and so
on. In other words, no infinite regress. No structure, no plan of organization, no
framework of ideas underlaid by another structure or level of ideas, underlaid by
yet another level, and yet another, ad infinitum, down to bottomless blackness. To
endlessness no alternative is evident but a loop,I4] such as: Physics gives rise to
observer-participancy; observer-participancy gives rise to information; and infor-
mation gives rise to physics.
Is existence thus built99 on "insubstantial nothingness"? Rutherford and Bohr
made a table no less solid when they told us it was 99.9... percent emptiness.
Thomas Mann may exaggerate when he suggestel that "... we are actually bringing
about what seems to be happening to us," but Leibniz69 reassures us that "although
the whole of this life were said to be nothing but a dream and the physical world
nothing but a phantasm, I should call this dream or phantasm real enough if, using
reason well, we were never deceived by it."
SECOND NO
No laws. "So far as we can see today, the laws of physics cannot have existed from
everlasting to everlasting. They must have come into being at the big bang. There
were no gears and pinions, no Swiss watchmakers to put things together, not even
a pre-existing plan.... Only a principle of organization which is no organization
at all would seem to offer itself. In all of mathematics, nothing of this kind more
obviously offers itself than the principle that 'the boundary of boundary is zero.'
Moreover, all three great field theories of physics use this principle twice over....
This circumstance would seem to give us some reassurance that we are talking sense
when we think of... physics being"142 as foundation-free as a logic loop, the closed
circuit of ideas in a self-referential deductive axiomatic system.105,34,70,159
The universe as a machine? Is this universe one among a great ensemble of
machine universes, each differing from the others in the values of the dimensionless
constants of physics? Is our own selected from this ensemble by an anthropic princi-
ple of one or another form?7 We reject here the concept of universe not least because
it "has to postulate explicitly or implicitly, a supermachine, a scheme, a device, a
miracle, which will turn out universes in infinite variety and infinite number."156
Directly opposite to the concept of universe as machine built on law is the
vision of a world self-synthesized. In this view, the notes struck out on a piano by
the observer-participants of all places and all times, bits though they are, in and
by themselves constitute the great wide world of space and time and things.
THIRD NO
No continuum. No continuum in mathematics and therefore no continuum in
physics. A half-century of development in the sphere of mathematical logic151 has
made it clear that there is no evidence supporting the belief in the existential char-
acter of the number continuum. "Belief in this transcendental world," Hermann
Weyl tells us, "taxes the strength of our faith hardly less than the doctrines of
the early Fathers of the Church or of the scholastic philosophers of the Middle
Ages."122 This lesson out of mathematics applies with equal strength to physics.
"Just as the introduction of the irrational numbers ... is a convenient myth [which]
simplifies the laws of arithmetic ...so physical objects," Willard Van Orman Quine
tells us,92 "are postulated entities which round out and simplify our account of the
flux of existence .... The conceptual scheme of physical objects is a convenient myth,
simpler than the literal truth and yet containing that literal truth as a scattered
part."
Nothing so much distinguishes physics as conceived today from mathematics
as the difference between the continuum character of the one and the discrete char-
acter of the other. Nothing does so much to extinguish this gap as the elementary
quantum phenomenon "brought to a close," as Bohr puts it,19 by "an irreversible
(5]See for example the survey by S. Feferman, "Turing in the Land of 0(z)," pages 113-147, and
related papers on mathematical logic in R. Herken.56
10 John Archibald Wheeler
4. FIVE CLUES
FIRST CLUE
The boundary of a boundary is zero. This central principle of algebraic topology,103
identity, triviality, tautology though it is, is also the unifying theme of Maxwell
electrodynamics, Einstein geometrodynamics, and almost every version of modern
field theory.191 That one can get so much from so little, almost everything from
almost nothing, inspires hope that we will someday complete the mathematization
of physics and derive everything from nothing, all law from no law.
161Discovered among the graffiti in the men's room of the Pecan Street Cafe, Austin, Texas.
11 see wheekr.224 0.25 and MTW,77 section 43.4.
[8)See Wheeler,132 page 411.
Msee MTW77, Chapter 15; Atiyah,6 cartan,23 ,24 and Kheyfets and Wheeler.64
Information, Physics, Quantum: The Search for Links 11
SECOND CLUE
No question, no answer. Better put, no bit-level question, no bit-level answer. So
it is in the game of twenty questions in its surprise version [101 And so it is for
the electron circulating within the atom or a field within a space. To neither field
nor particle can we attribute a coordinate or momentum until a device operates to
measure the one or the other. Moreover, any apparatus that accuratelym measures
the one quantity inescapably rules out then and there the operation of equipment to
measure the other.17'18'55'121 In brief, the choice of question asked, and the choice
of when it's asked, play a part—not the whole part, but a part—in deciding what
we have the right to say.149,152
Bit-registration of a chosen property of the electron, a bit-registration of the
arrival of a photon, Aharonov-Bohm bit-based determination of the magnitude
of a field flux, bulk-based count of bits bound in a blackhole: all are examples
of physics expressed in the language of information. However, into a bit count
that one might have thought to be a private matter, the rest of the nearby world
irresistibly thrusts itself. Thus the atom-to-atom distance in a ruler—basis for a
bit count of distance—evidently has no invariant status, depending as it does on
the temperature and pressure of the environment. Likewise the shift of fringes in
the Aharonov-Bohm experiment depends not only upon the magnetic flux itself,
but also on the charge of the electron. But this electron charge—when we take
the quantum itself to be nature's fundamental measuring unit—is governed by the
square root of the quantity e2/hc = 1/137.036 ..., a "constant" which—for extreme
conditions—is as dependent on the local environment47 as is a dielectric "constant"
or the atom-to-atom spacing in the ruler.
The contribution of the environment becomes overwhelmingly evident when we
turn from length of bar or flux of field to the motion of alpha particle through cloud
chamber, dust particle through 3°K-background radiation or Moon through space.
This we know from the analyses of Bohr and Mott,79 Zeh,167'188 Joos and Zeh,61 ,
Zurek,17°,171372 and Unruh and Zurek.113 It from bit, yes; but the rest of the world
also makes a contribution, a contribution that suitable experimental design can
minimize but not eliminate. Unimportant nuisance? No. Evidence the whole show
is wired up together? Yes. Objection to the concept of every it from bits? No.
Build physics, with its false face of continuity, on bits on information! What
this enterprise is we perhaps see more clearly when we examine for a moment a
thoughtful, careful, wide-reaching exposition51 of the directly opposite thesis, that
physics at bottom is continuous; that the bit of information is not the basic en-
tity. Rate as false the claim that the bit of information is the basic entity. Instead,
attempt to build everything on the foundation of some "grand unified field the-
ory" such as string theory26,46 —or, in default of that, on Einstein's 1915 and still
standard geometrodynamics. Hope to derive that theory by way of one or another
plausible line of reasoning. But don't try to derive quantum theory. Treat it as
supplied free of charge from on high. Treat quantum theory as a magic sausage
supplied free of charge from on high. Treat quantum theory as a magic sausage
grinder which takes in as raw meat this theory, that theory, or the other theory,
and turns out a "wave equation," one solution of which is "the" wave function for
the universe.5°,51,54,115,126 From start to finish accept continuity as right and nat-
ural: continuity in the manifold, continuity in the wave equation, continuity in its
solution, continuity in the features that it predicts. Among conceivable solutions
of this wave equation select as reasonable one which "maximally decoheres," one
which exhibits "maximal classicity"—maximal classicity by reason, not of "some-
thing external to the framework of wave function and Schr8dinger equation," but
something in "the initial conditions of the universe specified within quantum theory
itself."
How do we compare the opposite outlooks of decoherence and it-from-bit?
Remove the casing that surrounds the workings of a giant computer. Examine the
bundles of wires that run here and there. What is the status of an individual wire?
The mathematical limit of the bundle? Or the building block of the bundle? The
one outlook regards the wave equation and wave function to be primordial and
precise and built on continuity, and the bit to be an idealization. The other outlook
regards the bit to be the primordial entity, and wave equation and wave function
to be secondary and approximate—and derived from bits via information theory.
Derived, yes; but how? No one has done more than William Wootters toward
opening up a pathway161'162 from information to quantum theory. He puts into
connection two findings, long known, but little known. Already before the ad-
vent of wave mechanics, he notes, the analyst of population statistics R. A. Fisher
proved4o,41 that the proper tool to distinguish one population from another is not
the probability of this gene, that gene, and the third gene (for example), but the
square roots of these probabilities; that is to say, the two probability amplitudes,
each probability amplitude being a vector with three components. More precisely,
Wooters proves, the distinguishability between the two populations is measured by
the angle in Hilbert space between the two state vectors, both real. Fisher, however,
was dealing with information that sits "out there." In microphysics, however, the
information does not sit out there. Instead, nature in the small confronts us with a
revolutionary pistol, "No question, no answer." Complementarity rules. And corn-
plementarity, as E. C. G. Stueckelberg proved197,1°8 as long ago as 1952, and as
Saxon made more readily understandable95 in 1964, demands that the probability
amplitudes of quantum physics must be complex. Thus Wootters derives famil-
iar Hilbert space with its familiar complex probability amplitudes from the twin
demands of complementarity and measure of distinguishability.
Should we try to go on from Wootters' finding to deduce the full blown machin-
ery of quantum field theory? Exactly not to try to do so—except as an idealization—
is the demand laid on us by the concept of it from bit. How come?
Probabilities exist "out there" no more than do space or time or the position
of the atomic electron. Probability, like time, is a concept invented by humans, and
humans have to bear the responsibility for the obscurities that attend it. Obscurities
there are whether we consider probability defined as frequency67 or defined a la
Bayes.60,94,97,114 Probability in the sense of frequency has no meaning as applied
Information, Physics, Quantum: The Search for Links 13
to the spontaneous fission of the particular plutonium nucleus that triggered the
November 1, 1952 H-bomb blast.
What about probabilities of a Bayesian cast, probabilities "interpreted not
as frequencies observable through experiments, but as degrees of plausibility one
assigns to each hypothesis based on the data and on one's assessment of the plausi-
bility of the hypotheses prior to seeing the data"?31 Belief-dependent probabilities,
different probabilities assigned to the same proposition by different people?14 Proba-
bilities associated21 with the view that "objective reality is simply an interpretation
of data agreed to by large numbers of people?"
Heisenberg directs us to the experiences8 of the early nuclear-reaction-rate the-
orist Fritz Houtermans, imprisoned in Kharkov during the time of the Stalin ter-
ror: "... the whole cell would get together to produce an adequate confession ...
[and] helped [the prisoners] to compose their 'legends' and phrase them properly,
implicating as few others as possible."
Existence as confession? A myopic but in some ways illuminating formulation
of the demand for intercommunication implicit in the theme of it from bit!
So much for "No question, no answer."
THIRD CLUE
The super-Copernican principle.188 This principle rejects now-centeredness in any
account of existence as firmly as Copernicus repudiated here-centeredness. It re-
pudiates most of all any tacit adoption of now-centeredness in assessing observer-
participants and their number.
What is an observer-participant? One who operates an observing device and
participates in the making of meaning, meaning in the sense of Follesda1,42 "Mean-
ing is the joint product of all the evidence that is available to those who commu-
nicate." Evidence that is available? The investigator slices a rock and photographs
the evidence for the heavy nucleus that arrived in the cosmic radiation of a billion
years ago.149 Before he can communicate his findings, however, an asteroid atom-
izes his laboratory, his records, his rocks, and him. No contribution to meaning!
Or at least no contribution then. A forensic investigation of sufficient detail and
wit to reconstruct the evidence of the arrival of that nucleus is difficult to imagine.
What about the famous tree that fell in the forest with no one around?18 It leaves
a fallout of physical evidence so near at hand and so rich that a team of up-to-
date investigators can establish what happened beyond all doubt. Their findings
contribute to the establishment of meaning.
"Measurements and observations," it has been said,58 "cannot be fundamental
notions in a theory which seeks to discuss the early universe when neither existed."
On this view the past has a status beyond all questions of observer-participancy.
It from bit offers us a different vision: "reality is theory"Ill]; "the past has no
evidence except as it is recorded in the present."(121 The photon that we are going
1111See T. Segerstedt as quoted in VVheeler,132 page 415.
1121See Wheeler,131 page 41.
14 John Archibald Wheeler
to register tonight from that four billion-year-old quasar cannot be said to have
had an existence "out there" three billion years ago, or two (when it passed an
intervening gravitational lens) or one, or even a day ago. Not until we have fixed
arrangements at our telescope do we register tonight's quantum as having passed
to the left (or right) of the lens or by both routes (as in a double-slit experiment).
This registration, like every delayed-choice experiment,75,131 reminds us that no
elementary quantum phenomenon is a phenomenon until, in Bohr's words,19 "It
has been brought to a close" by "an irreversible act of amplification." What we call
the past is built on bits.
Enough bits to structure a universe so rich in features as we know this world to
be? Preposterous! Mice and men and all on Earth who may ever come to rank as
intercommunicating meaning-establishing observer-participants will never mount a
bit count sufficient to bear so great a burden.
The count of bits needed, huge though it may be, nevertheless, so far as we
can judge, does not reach infinity. In default of a better estimate, we follow familiar
reasoning189 and translate into the language of bits the entropy of the primordial
cosmic fireball as deduced from the entropy of the present 2.735 deg K (uncertainty
< 0.05 deg K) microwave relict radiation" totaled over a 3-sphere of radius 13.2 x
109 light years (uncertainty > 35%)1131 or 1.25 x 1028 cm and of volume 272 radius3,
=8 x 1088
It would be totally out of place to compare this overpowering number with the num-
ber of bits of information elicited up to date by observer-participancy. So warns the
super-Copernican principle. We today, to be sure, through our registering devices,
give a tangible meaning to the history of the photon that started on its way from
a distant quasar long before there was any observer-participancy anywhere. How-
ever, the far more numerous establishers of meaning of time to come have a like
inescapable part—by device-elicited question and registration of answer—in gener-
ating the "reality" of today. For this purpose, moreover, there are billions of years
yet to come, billions on billions of sites of observer-participancy yet to be occu-
pied. How far foot and ferry have carried meaning-making communication in fifty
thousand years gives faint feel for how far interstellar propagation is destined82,59
to carry it in fifty billion years.
Do bits needed balance bits achievable? They must, declares the concept of
"world as system self-synthesized by quantum networking."156 By no prediction
does this concept more clearly expose itself to destruction, in the sense of Popper.9°
(131See MTW,77 page 738, Box 27.4; or JGST,157 Chapt. 13, page 242.
Information, Physics, Quantum: The Search for Links 15
FOURTH CLUE
"Consciousness." We have traveled what may seem a dizzying path. First, ele-
mentary quantum phenomenon brought to a close by an irreversible act of am-
plification. Second, the resulting information expressed in the form of bits. Third,
this information used by observer-participants—via communication—to establish
meaning. Fourth, from the past through the billeniums to come, so many observer-
participants, so many bits, so much exchange of information, as to build what we
call existence.
Doesn't this it-from-bit view of existence seek to elucidate the physical world,
about which we know something, in terms of an entity about which we know al-
most nothing, consciousness?22,33,43,44 And doesn't Marie Sklodowska Curie tell us,
"Physics deals with things, not people"? Using such and such equipment, making
such and such a measurement, I get such and such a number. Who I am has nothing
to do with this finding. Or does it? Am I sleepwalking?28," Or am I one of those poor
souls without the critical power to save himself from pathological science?57,100,66
Under such circumstances any claim to have "measured" something falls fiat until it
can be checked out with one's fellows. Checked how? Morton White reminds use"
how the community applies its tests of credibility, and in this connection quotes
analyses by Chauncey Wright, Josiah Royce, and Charles Saunders Peirce.1141 Par-
menides of Elea83 (^...-. 515 B.C.-450 B.C.) may tell us that "What is ... is identical
with the thought that recognizes it." We, however, steer clear of the issues con-
nected with "consciousness." The line between the unconscious and the conscious
begins to fade91 in our day as computers evolve and develop—as mathematics has—
level upon level upon level of logical structure. We may someday have to enlarge
the scope of what we mean by a "who." This granted, we continue to accept—as
an essential part of the concept of it from bit—Follesdal's guideline,42 "Meaning is
the joint product of all the evidence that is available to those who communicate."
What shall we say of a view of existence1151 that appears, if not anthropomorphic in
its use of the word "who," still overly centered on life and consciousness? It would
seem more reasonable to dismiss for the present the semantic overtones of "who"
and explore and exploit the insights to be won from the phrases, "communication"
and "communication employed to establish meaning."
Follesdal's statement supplies not an answer, but the doorway to new questions.
For example, man has not yet learned how to communicate with an ant. When he
1141 See Peirce,87 especially passages from pages 335-337, 353, and 358. Peirce's position on the
forces of nature, "May they not have naturally grown up," foreshadow though it does the concept
of the world as a self-synthesized system, differs from it in one decisive point, in that it tacitly
takes time as primordial category supplied free of charge from outside.
Irs)soo von schelling,rti especially volume 5, pages 428-430, as kindly summarized for me by
B. Kanitscheider: "class das Universum von vornherein ein ihm inunanentes Ziel, eine teleologis-
che Struktur, besitzt und in alien seinen Produkten auf evolutionire Stadien ausgerichtet ist, die
schliesslich die Hervorbringung von Selbst-bewusstsein einschliessen, welshes dann aber wiederum
den Entstehungsprozess reflektiert und diese Reflexion ist die notwendige Bedingung fiir die Kon-
stitution der Gegenstinde des Bewusstseins."
16 John Archibald Wheeler
does, will the questions put to the world around by the ant and the answers that
he elicits contribute their share, too, to the establishment of meaning? As another
issue associated with communication, we have yet to learn how to draw the line
between a communication network that is closed, or parochial, and one that is
open. And how to use that difference to distinguish between reality and poker—or
another game116,118—so intense as to appear more real than reality. No term in
Follesdal's statement poses greater challenge to reflection than "communication,"
descriptor of a domain of investigation88,98,93 that enlarges in sophistication with
each passing year.
More is different.' Not by plan but by inner necessity, a sufficiently large number of
H2 O molecules collected in a box will manifest solid, liquid, and gas phases. Phase
changes, superfluidity, and superconductivity all bear witness to Anderson's pithy
point, more is different.
We do not have to turn to objects so material as electrons, atoms, and molecules
to see big numbers generating new features. The evolution from small to large has
already in a few decades forced on the computer a structure73,96 reminiscent of bi-
ology by reason of its segregation of different activities into distinct organs. Distinct
organs, too, the giant telecommunications system of today finds itself inescapably
evolving."'" Will we someday understand time and space and all the other fea-
tures that distinguish physics—and existence itself—as the similarly self-generated
organs of a self-synthesized information system?165,65
5. CONCLUSION
The spacetime continuum? Even continuum existence itself? Except as an idealiza-
tion neither the one entity nor the other can make any claim to be a primordial
category in the description of nature. It is wrong, moreover, to regard this or that
physical quantity as sitting "out there" with this or that numerical value in default
of the question asked and the answer obtained by way of an appropriate observing
device. The information thus solicited makes physics and comes in bits. The count
of bits drowned in the dark night of a blackhole displays itself as horizon area,
expressed in the language of the Bekenstein number. The bit count of the cosmos,
however it is figured, is ten raised to a very large power. So also is the number of
elementary acts of observer-participancy over any time of the order of fifty billion
years. And, except via those time-leaping quantum phenomena that we rate as el-
ementary acts of observer-participancy, no way has ever offered itself to construct
what we call "reality." That's why we take seriously the theme of it from bit.
Information, Physics, Quantum: The Search for Links 17
6. AGENDA
Intimidating though the problem of existence continues to be, the theme of it from
bit breaks it down into six issues that invite exploration:
1. Go beyond Wootters and determine what, if anything, has to be added to dis-
tinguishability and complementarity to obtain all of standard quantum theory.
2. Translate the quantum versions of string theory and of Einstein's geometrody-
namics from the language of continuum to the language of bits.
3. Sharpen the concept of bit. Determine whether "an elementary quantum phe-
nomenon brought to a close by an irreversible act of amplication" has at bottom
(1) the 0-or-1 sharpness of definition of bit number nineteen in a string of binary
digits, or (2) the accordion property of a mathematical theorem, the length of
which, that is, the number of supplementary lemmas contained in which, the
analyst can stretch or shrink according to his convenience.
4. Survey one by one with an imaginative eye the powerful tools that mathematics
—including mathematical logic—has won and now offers to deal with theorems
on a wholesale rather than a retail level, and for each such technique work
out the transcription into the world of bits. Give special attention to one and
another deductive axiomatic system which is able to refer to itself,102 one and
another self-referential deductive system.
5. From the wheels-upon-wheels-upon-wheels evolution of computer programming
dig out, systematize, and display every feature that illuminates the level-upon-
level-upon-level structure of physics.
6. Capitalize on the findings and outlooks of information theory,25,30,111,166 algo-
rithmic entropy,174 evolution of organisms,3s,33,81 and pattern recogni-
tion.1,13,48,76,101,104,110,119 Search out every link that each has with physics
at the quantum level. Consider, for instance, the string of bits 1111111... and
its representation as the sum of the two strings 1001110... and 0110001 .... Ex-
plore and exploit the connection between this information-theoretic statement
and the findings of theory and experiment on the correlation between the polar-
izations of the two photons emitted in the annihilation of singlet positronium123
and in like Einstein-Podolsky-Rosen experiments.16 Seek out, moreover, every
realization in the realm of physics of the information-theoretic triangle inequal-
ity recently discovered by Zurek.173
Deplore? No, celebrate the absence of a clean clear definition of the term "bit"
as the elementary unit in the establishment of meaning. We reject "that view of
science which used to say, 'Define your terms before you proceed.' The truly creative
nature of any forward step in human knowledge," we know, "is such that theory,
concept, law, and method of measurement—forever inseparable—are born into the
world in union."109 If and when we learn how to combine bits in fantastically large
numbers to obtain what we call existence, we will know better what we mean both
by bit and by existence.
18 John Archibald Wheeler
A single question animates this report: Can we ever expect to understand ex-
istence? Clues we have, and work to do, to make headway on that issue. Surely
someday, we can believe, we will grasp the central idea of it all as so simple, so
beautiful, so compelling that we will say to each other, "Oh, how could it have
been otherwise! How could we all have been so blind so long!"
ACKNOWLEDGMENTS
For discussion, advice, or judgment on one or another issue taken up in this review,
I am indebted to Nandar Balazs, John D. Barrow, Charles H. Bennett, David
Deutsch, Robert H. Dicke, Freeman Dyson, and the late Richard P. Feynman as
well as David Gross, James B. Hartle, John J. Hopfield, Paul C. Jeffries, Bernulf
Kanitscheider, Arkady Kheyfets, and Rolf W. Landauer; and to Warner A. Miller,
John R. Pierce, Willard Van Orman Quine, Benjamin Schumacher, and Frank J.
Tipler as well as William G. Unruh, Morton White, Eugene P. Wigner, William K.
Wootters, Hans Dieter Zeh, and Wojciech H. Zurek. For assistance in preparation
of this report I thank E. L. Bennett and NSF grant PHY245-6243 to Princeton
University. I give special thanks to the Santa Fe Institute and the organizers of the
May—June 1989 Conference on Complexity, Entropy, and the Physics of Information
at which the then-current version of the present analysis was reported.
This report evolved from presentations at the Santa Fe Institute conferences,
May 29—June 2 and June 4-8, 1989, and at the 3rd International Symposium on
Foundations of Quantum Mechanics in the Light of New Technology, Tokyo, Au-
gust 28-31, 1989, under the title "Information, Physics, Quantum: The Search
for Links"; and headed "Can We Ever Expect to Understand Existence?" as the
Penrose Lecture at the April 20-22, 1989, annual meeting of Benjamin Franklin's
"American Philosophical Society, Held at Philadelphia for Promoting Useful Knowl-
edge," and at the Accademia Nazionale dei Lincei Conference on La Verite. nella
Scienza, Rome, October 13, 1989; submitted to the proceedings of all four in ful-
fillment of obligation and in deep appreciation for hospitality.
Information, Physics, Quantum: The Search for Links 19
REFERENCES
Three reference abbreviations: JGST=157, MTW=77, and WZ=148.
1. Agu, M. "Field Theory of Pattern Recognition." Phys. Rev. A 37 (1988):
4415-4418.
2. Aharonov, Y., and D. Bohm. "Significance of Electromagnetic Potentials in
the Quantum Theory." Phys. Rev. 115 (1959):485-491.
3. Anandan, J. "Comment on Geometric Phase for Classical Field Theories."
Phys. Rev. Lett. 60 (1988):2555.
4. Anadan, J., and Y. Aharonov. "Geometric Quantum Phase and Angles." Phys.
Rev. D 38 (1988):1863-1870. Includes references to the literature of the sub-
ject.
5. Anderson, P. W. "More is Different." Science 177 (1972):393-396.
6. Atiyah, M. Collected Papers, Vol. 5: Gauge Theories. Oxford: Clarendon,
1988.
7. Barrow, J. D., and F. J. Tipler. The Anthropic Cosmological Principle. New
York: Oxford Univ. Press, 1986. Also the literature therein cited.
8. Beck, F. [pseudonym of the early nuclear-reaction-rate theorist Fritz Router-
mans], and W. Godin. Translated from the German original by E. Mosbacher
and D. Porter. Russian Purge and the Extraction of Confessions. London:
Hurst and Blackett, 1951.
9. Bekenstein, J. D. "Black Holes and the Second Law." Nuovo Cimento Lett. 4
(1972):737-740.
10. Bekenstein, J. D. "Generalized Second Law of Thermodynamics in Black-Hole
Physics." Phys. Rev. D 8 (1973):3292-3300.
11. Bekenstein, J. D. "Black-Hole Thermodynamics." Physics Today 33 (1980):
24-31.
12. Bell, J. S. Collected Papers in Quantum Mechanics. Cambridge, UK: Cam-
bridge Univ. Press, 1987.
13. Bennett, B. M., D. D. Hoffman, and C. Prakash Observer Mechanics: A
Formal Theory of Perception. San Diego: Academic Press, 1989.
14. Berger, J. 0., and D. A. Berry. "Statistical Analysis and the Illusion of Ob-
jectivity." Am. Scientist 76 (1988):159-165.
15. Berkeley, G. Treatise Concerning the Principles of Understanding. Dublin,
1710; 2nd edition, 1734. Regarding his reasoning that "No object exists apart
from mind," cf. article on Berkeley by R. Adamson, Encyclopedia Brittanica,
Chicago 3 (1959), 438.
16. Bohm, D. "The Paradox of Einstein, Rosen and Podolsky." Originally pub-
lished in Quantum Theory, section 15-19, Chapter 22. Englewood Cliffs, NJ:
Prentice-Hall, 1950. Reprinted in WZ,148 pp. 356-368.
17. Bohr, N. "The Quantum Postulate and the Recent Development of Atomic
Theory." Nature 121 (1928):580-590.
20 John Archibald Wheeler
18. Bohr, N., and L. Rosenfeld. "Zur Frage der Messbarkeit der elektromagnetis-
chen FeldgrEssen." Mat.-fys Medd. Dan. Vid. Selsk. 12(8) (1933). English
translation by Aage Petersen, 1979; reprinted in WZ,148 pp. 479-534.
19. Bohr, N. "Can Quantum-Mechanical Description of Physical Reality be Con-
sidered Complete?" Phys. Rev. 48 (1935):696-702. Reprinted in WZ,148 pp.
145-151.
20. Brink, L., and M. Henneaux. Principles of String Theory: Studies of the Cen-
tro de Estudios Cientificos de Santiago. New York: Plenum, 1988.
21. Burke, J. The Day the Universe Changed. Boston, MA: Little, Brown, 1985.
22. Calvin, W. H. The Cerebral Symphony. New York: Bantam, 1990.
23. Cartan, E. La Geometrie des Espaces de Riemann, Memorial des Sciences
Mathematiques. Paris: Gauthier-Villars, 1925.
24. Cartan, E. Lecons sur la Geometrie des Espaces de Riemann. Paris: Gauthier-
Villars, 1925.
25. Chaitin, G. J. Algorithmic Information Theory, revised 1987 edition. Cam-
bridge, UK: Cambridge Univ. Press, 1988.
26. Christodoulou, D. "Reversible and Irreversible Transformations in Black-Hole
Physics." Phys. Rev. Lett. 25 (1970):1596-1597.
27. Christodoulou, D., and R. Ruffini. "Reversible Transformations of a Charged
Black Hole." Phys. Rev. D 4 (1971):3552-3555.
28. Collins, W. W. The Moonstone. London, 1868.
29. Darwin, C. W. (1809-1882). On the Origin of Species by Means of Natural
Selection, or the Preservation of Favoured Races in the Struggle for Life. Lon-
don, 1859.
30. Delahaye, J.-P. "Chaitin's Equation: An Extension of GOdel's Theorem." No-
tices Amer. Math. Soc. 36 (1989):984-987.
31. Denning, P. J. "Bayesian Learning." Am. Scientist 77 (1989):216-218.
32. d'Espagnat, B. Reality and the Physicist: Knowledge, Duration and the Quan-
tum World. Cambridge, UK: Cambridge Univ. Press, 1989.
33. Edelman, G. M. Neural Darwinism. New York: Basic Books, 1987.
34. Ehresmann, C. Categories et Structures. Paris: Dunod, 1965.
35. Eigen, M., and R. Winkler. Das Spiel: Naturgesetze steuern den Zufall.
Munchen: Piper, 1975.
36. Einstein, A., to J. J. Laub, 1908, undated, Einstein Archives; scheduled for
publication in The Collected Papers of Albert Einstein, a group of volumes on
the Swiss years 1902-1914, Volume 5: Correspondence, 1902-1914, Princeton
University Press, Princeton, New Jersey.
37. Einstein, A. "Zur allgemeinen Relativititstheorie." Preuss. Akad. Wiss.
Berlin, Sitzber (1915), 799-801, 832-839, 844-847; (1916), 688-696; and
(1917), 142-152.
38. Einstein, A. As quoted by A. Forsee in Albert Einstein, Theoretical Physicist.
New York: Macmillan, 1963, 81.
39. Masser, W. M. Reflections on a Theory of Organisms. Frelighsburg, Quebec:
Orbis, 1987.
Information, Physics, Quantum: The Search for Links 21
40. Fisher, R. A. "On the Dominance Ratio." Proc. Roy. Soc. Edin. 42 (1922):
321-341.
41. Fisher, R. A. Statistical Methods and Statistical Inference. New York: Hefner,
1956, 8-17.
42. Follesdal, D. "Meaning and Experience." In Mind and Language, edited by S.
Guttenplan. Oxford: Clarendon, 1975, 25-44.
43. Fuller, R. W., and P. Putnam. "On the Origin of Order in Behavior." General
Systems (Ann Arbor, MI) 12 (1966):111-121.
44. Fuller, R. W. "Causal and Moral Law: Their Relationship as Examined in
Terms of a Model of the Brain." Monday Evening Papers. Middletown, CT:
Wesleyan Univ. Press, 1967.
45. Green, M. B., J. H. Schwarz, and E. Witten. Superstring Theory. Cambridge,
UK: Cambridge Univ. Press, 1987.
46. Greenberger, D. M., ed. New Techniques and Ideas in Quantum Measurement
Theory. Annals of the New York Academy of Sciences, 1986, vol. 480.
47. Gross, D. J. "On the Calculation of the Fine-Structure Constant." Phys. To-
day 42(12) (1989).
48. Haken, H., ed. Pattern Formation by Dynamic Systems and Pattern Recogni-
tion. Berlin: Springer, 1979.
49. Haken, H. Information and Self-Organization: A Macroscopic Approach to
Complex Systems. Berlin: Springer, 1988.
50. Hartle, J. B., and S. W. Hawking. "Wave Function of the Universe." Phys.
Rev. D 28 (1983):2960-2975.
51. Hartle, J. B. "Progress in Quantum Cosmology." Preprint from the Physics
Department, University of California at Santa Barbara, 1989.
52. Hawking, S. W. "Particle Creation by Black Holes." Commun. Math. Phys.
43 (1975):199-220.
53. Hawking, S. W. "Black Holes and Thermodynamics." Phys. Rev. 13 (1976):
191-197.
54. Hawking, S. W. "The Boundary Conditions of the Universe." In Astrophysical
Cosmology, edited by H. A. Briick, G. V. Coyne,and M. S. Longair. Vatican
City: Pontificia Academia Scientiarum, 1982, 563-574.
55. Heisenberg, W. "Uber den anschaulichen Inhalt der quantentheoretischen
Kinematik and Mechanik." Zeits. f. Physik 43 (1927):172-198. English trans-
lation in WZ,148 pp. 62-84.
56. Herken, R., ed. The Universal Turing Machine: A Half-Century Survey. Ham-
burg: Kammerer & Unverzagt and New York: Oxford Univ. Press, 1988.
57. Hetherington, N. S. Science and Objectivity: Episodes in the History of As-
tronomy. Ames, IA: Iowa State Univ. Press, 1988.
58. Hobson, J. Allan. Sleep. Scientific American Library. New York: Freeman,
1989, 86, 89, 175, 185, 186.
59. Jastrow, R. Journey to the Stars: Space Exploration-Tomorrow and Beyond.
New York: Bantam, 1989.
22 John Archibald Wheeler
145. Wheeler, J. A. "On Recognizing Law without Law." Am. J. Phys. 51 (1983):
398-404.
146. Wheeler, J. A. "Jenseits aller Zeitlichkeit." In Die Zeit, Schriften der Carl
Friedrich von Siemens-Stiftung, edited by A. Peisl and A. Mohler. Miinchen:
Oldenbourg, 1983, vol. 6, 17-34.
147. Wheeler, J. A. "Elementary Quantum Phenomenon as Building Unit." In
Quantum Optics, Experimental Gravitation, and Measurement Theory, edited
by P. Meystre and M. Scully. New York and London: Plenum, 1983, 141-143.
148. Wheeler, J. A., and W. H. Zurek. Quantum Theory and Measurement. Prince-
ton: Princeton Univ. Press, 1983.
149. Wheeler, J. A. "Bits, Quanta, Meaning." In Problems in Theoretical Physics,
edited by A. Giovannini, F. Mancini, and M. Marinaro. Salerno: Univ. of
Salerno Press, 1984, 121-141. Also in Theoretical Physics Meeting: Atti del
Convegno, Amalfi, 6-7 maggio 1983, Edizioni Scientifiche Italiane, Naples
(1984), 121-134. Also in A. Giovannini, F. Mancini, M. Marinaro, and A.
Rimini, Festschrift in Honour of Eduardo R. Caianiello, World Scientific, Sin-
gapore (1989).
150. Wheeler, J. A. "Quantum Gravity: The Question of Measurement." In Quan-
tum Theory of Gravity, edited by S. M. Christensen. Bristol: Hilger, 1984,
224-233.
151. Wheeler, J. A. "Bohr's 'Phenomenon' and 'Law without Law." In Chaotic
Behavior in Quantum Systems, edited by G. Casati. New York: Plenum, 1985,
363-378.
152. Wheeler, J. A. "Physics as Meaning': Three Problems." In Frontiers of Non-
Equilibrium Statistical Physics, edited by G. T. Moore and M. 0. Scully. New
York: Plenum, 1986, 25-32.
153. Wheeler, J. A. "Interview on the Role of the Observer in Quantum Mechan-
ics." In The Ghost in the Atom, edited by P. C. W. Davies and J. R. Brown.
Cambridge: Cambridge Univ. Press, 1986, 58-69.
154. Wheeler, J. A. "How Come the Quantum." In New Techniques and Ideas in
Quantum Measurement Theory, edited by D. M. Greenberger. Ann. New York
Acad. Sci. 480 (1987):304-316.
155. Wheeler, J. A. "Hermann Weyl and the Unity of Knowledge." In Exact Sci-
ences and Their Philosophical Foundations, edited by W. Deppert et al. Frank-
furt am Main: Lang, 1988, 469-503. Appeared in abbreviated form in Am.
Scientist 74 (1986):366-375.
156. Wheeler, J. A. "World as System Self-Synthesized by Quantum Networking."
IBM J. Res. 6 Dev. 32 (1988):4-25. Reprinted in E. Agazzi, ed., Probability
in the Sciences, Kluwer, Amsterdam (1988), 103-129.
157. Wheeler, J. A. A Journey into Gravity and Spacetime. Scientific American
Library. New York: Freeman, 1990.
158. White, M. Science and Sentiment in America: Philosophical Thought from
Jonathan Edwards to John Dewey. New York: Oxford Univ. Press, 1972.
28 John Archibald Wheeler
upon its surface is a signal. According to the code known as "written English" this
signal corresponds to a message (which includes, for instance, a description of the
communication process). This general picture of communication includes both the
notion of information transfer and the notion of information storage and retrieval.
Information theory as formulated by Shannon takes an essentially statistical
approach to this process. A particular message xi is chosen with probability p(xi )
from an abstract set X of possible messages. The information content of the message
is given by the information function
(All logarithms have base 2, so that H(X) is in "bits.") H(X) can be viewed as
a measure of the receiver's uncertainty about X before the signal is transmitted.
After the transmission, the receiver has examined the channel with result yk (from
a set Y of possible results) and ascribes a conditional probability p(xilyk) to each
possible message. If the channel is "noisy," the receiver may still have a non-zero
degree of uncertainty about the message X—on average, an amount
where H(X,Y) and H(Y) are defined by the joint distribution for X and Y and the
marginal distribution for Y, respectively. Thus, the receiver has gained an amount
of information
p(xi, a) = p(xi), Tr7rap(xi). From this distribution the mutual information H(X :
A) can be calculated using Eq. 3.
The ensemble of possible messages gives rise to an ensemble of possible signals.
This ensemble is described by the density operator
p correctly predicts the ensemble average of any quantum observable. For example,
the average signal energy is (E) = TrpH . The entropy of the signal ensemble,
defined by
S[p] = — Trp log p, (5)
is a quantity with obvious analogies to the information function H(X), which is
in fact frequently called the "entropy." However, the two are quite different. In-
formation is a semantic quantity, a function of the abstract ensemble of possible
messages. Entropy is a physical quantity with a thermodynamic meaning. The re-
lation between the two is a key issue in the physics of information.
A particularly deep insight into this question is provided by a theorem of
A. S. Kholevo5 which sets a bound on H(X : A), the amount of information con-
veyed by the quantum channel Q. Kholevo showed that
with equality only if the signal states p(xi) all commute with one another. Since the
subtracted term on the right is non-negative, it trivially follows that
H(X : A) < S[p]. That is, a quantum channel Q can deliver an amount of in-
formation no greater than the entropy of the ensemble of signals.
I should remark that the model of measurement used by Kholevo in the proof
of this theorem is a very general one. He assumes that the decoding observable A
is a positive-operator-valued (POV) measure; that is, each measurement outcome
a is associated with a positive operator 7r. in 11Q for which
E 7r. = 1 . (7)
The probabilities for the various measurement outcomes are given by the usual
quantum trace rule. For an ordinary measurement, the ire's are projections—that is,
ordinary measurements are projection-valued (PV) measures. The POV measures
clearly include the PV measures as a subset 3
32 Benjamin Schumacher
CHANNEL CAPACITY
One consequence of Kholevo's theorem is that simple quantum channels cannot
hold an unlimited amount of information. Suppose that dim HQ = N. It is al-
ways possible to increase H(X) by increasing the number of possible messages and
signals. Further, since we might allow POV measures in our class of observable
quantities, there is no limit to the number of measurement outcomes and hence no
limit to H(A). In other words, the sender can attempt to put as much information
as he wishes into the channel, and the receiver can attempt to acquire as much
information as he wishes from the channel. However, the entropy of the signal en-
semble is bounded by S[p] < log N. Therefore, by Kholevo's theorem no possible
coding-decoding scheme can use the channel Q to convey a quantity information
H(X : A) greater than log N. A spin-1/2 system, for example, has an information
capacity of just one bit.
This is intuitively satisfying, since we sometimes think of a spin-1/2 system as
a "two-state" system. But in fact there are an infinite number of states of a spin-
1/2 system, one for each point on the Stokes sphere (pure states) or in its interior
(mixed states). An unlimited amount of information can be coded in the spin state.
Nevertheless, the quantum state of the spin is not an observable, and the accessible
information can be no larger than a single bit.
On the other hand, since the receiver can choose the decoding observable, he has
a choice about which part of the coded information to access. This can be illustrated
by Wiesner's quantum multiplexing.10 Imagine that Q is a spin-1/2 system, and let
+) and I —) be the eigenstates of a,. The idea is to code two distinct one-bit
messages X and Y into the channel Q. Four possible joint messages (XY = 00, 01,
11, or 10) are coded in the following four signal states:
where 8 = 7/8. If each message has probability 1/4, then the message information
H(XY) is two bits.
No observable can read both bits, but it is possible to read something of either
bit by a suitable choice of measurement. If the receiver measures oz , for example,
he can read the first bit X with an error probability of about 15%, though he learns
nothing about the second bit. That is, H(X : .4 and H(Y : oz) = 0. Similarly,
a measurement of oy yields .4 bits of information about Y but no information about
X. In each case less than one bit is received, but this deficiency can be overcome in
a long sequence of messages by the use of redundancy and error-correcting codes.
Two distinct messages can thus be coded into complementary observables az and
cr; the receiver can read either one, but not both.
Information from Quantum Measurements 33
Notice that even the sum of the mutual informations in this example is less
than one bit. This is not accidental and is an expression of the complementarity of
the decoding observables. Maassen and Uffink7 have showed that, for any complete
ordinary (PV measure) observables A and B and any state p,
H(AIp) + H(B1p)> C
= —log (sup I (ai I bj)12) , (9)
where I ai) and I b,) are eigenstates of A and B, respectively. Eq. 9 amounts to an
information-theoretic uncertainty relation for A and B, and is the strongest such
inequality yet derived for finite-state quantum systems. If dim HQ = N, then for
any message X coded into Q,
H (X : A) + H (X : B) = H(A) + H(B) — [H(A I X) + H(B I X)]
< 2 log N — C . (10)
That is, the measurement of A on subsystem Qi does not affect the statistics of any
measurement on Q2. (This is exactly the statement of locality for quantum measure-
ment theory; if it were not true, it would be possible to use quantum correlations
to send signals faster than the speed of light or into the past!)
Although there is no question of communication in this situation, there is a
formal similarity between quantum communication and quantum correlation. The
A-measurement outcomes correspond to the possible messages; for each "message"
a there is a "message probability" p(a) and a "signal state" p2(a) of the "channel"
Q2. A measurement of B on the channel provides an amount of information about
A limited by Kholevo's theorem:
where P is the signal power and R is the information transmission rate. In other
words, the power requirement increases as the square of the information rate.
Kholevo's theorem sheds light on these questions by providing a limit for the
amount of accessible information that may be represented by the state of a partic-
ular quantum channel. Thus,
36 Benjamin Schumacher
ACKNOWLEDGMENTS
This paper is drawn from the Ph.D. thesis work I have done under the direction
of Prof. John A. Wheeler at the University of Texas, and I would like to acknowl-
edge his continuing help and inspiration. In addition, I am greatly indebted to Bill
Wootters, Charles Bennett, Carlton Caves, Murray Gell-Mann, Leonid Khalfin, and
Wojciech Zurek for their comments and suggestions. I also wish to thank the Santa
Fe Institute for hospitality and support during the workshop.
Information from Quantum Measurements 37
REFERENCES
1. Bekenstein, J. D. Phys. Rev. D23 (1981):287 if.
2. Bennett, C. H. "The Thermodynamics of Computation—a Review." Intl. J.
Theor. Phys. 21 (1982):905-940.
3. Busch, P. Intl. J. Theor. Phys. 24 (1985):63-91.
4. Everett, Hugh, III, "The Theory of The Universal Wave Function." In The
Many-Worlds Interpretation of Quantum Mechanics, edited by DeWitt, Bryce
S. and Graham, Neill. Princeton: Princeton University Press, 1973,3-137.
The conjecture is found on page 51.
5. Kholevo, A. S. "Bounds for The Quantity of Information Transmitted by
a Quantum Communication Channel." Problemy Peredachi Informatsii 9
(1973):3-11. This journal is translated by IEEE under the title Problems of
Information Transfer.
6. Landauer, R. IBM J. Research 3 (1961):183-191.
7. Maassen, H., and J. B. M. Uffink. "Generalized Entropic Uncertainty Rela-
tions." Phys. Rev. Lett. 60 (1988):1103-1106.
8. Pendry, J. B. J. Phys. A16 (1983):2161 ff.
9. Shannon, C. E. Bell System Technical Journal 27 (1948):379, 623.
10. Wiesner, S. SIGACT News 15 (1983):78-88.
11. Zurek, W. H., "Reversibility and Stability of Information Processing Sys-
tems." Phys. Rev. Lett. 53 (1984):391-394.
William K. Wootters
Santa Fe Institute, 1120 Canyon Road, Santa Fe, New Mexico 87501; Center for Nonlin-
ear Studies and Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM
87545; Permanent address: Department of Physics, Williams College, Williamstown, Mas-
sachusetts 01267
We show now that the three probabilities obtained in this way are sufficient
to determine the density matrix. Let PI , P2i and P3 be projection operators that
project onto the states selected by the three filters. Then the probability that a pho-
ton will pass through filter i is pi = tr(P03), where k is the beam's density matrix.
The trace tr(Pik) can be thought of as an inner product on the space of Hermitian
operators, so that we can think of pi as the length of the projection of 13 along Pi .
We also know that the quantity tr(//3) where / is the identity matrix, is equal to
unity. We thus have the projections of p along four "vectors," namely, P1, P2, P3,
and I. The space of 2 x 2 Hermitian matrices is four-dimensional. Therefore, as long
as these four "vectors" are linearly independent, the four projections will uniquely
determine the density matrix. One can verify that for the three filters mentioned
above, the three Pi's and I are indeed linearly independent. On the other hand,
it would not do to use three linearly polarizing filters oriented at different angles,
since the associated projectors, together with the identity, do not constitute a set
of linearly independent matrices.
We now need to figure out how many independent probabilities one obtains
from this scheme. There are nine different joint measurements that can be made
on a photon pair, that is, nine possible combinations of filters. Each of these mea-
surements has four possible outcomes: yes-yes, yes-no, no-yes, and no-no, where
"yes" means that the photon passes through. Thus one obtains 9 x 4 = 36 different
probabilities. But of course these probabilities are not all independent. For each
measurement, the probabilities of the four outcomes must sum to unity. Moreover,
the unconditional probability that a photon on one side will pass through a given
filter cannot depend on which filter the corresponding photon on the other side
happens to encounter. Quantum mechanics forbids such dependence, and indeed
such a dependence could be used to send signals faster than light. Given these
restrictions, one can convince oneself that the following probabilities constitute a
complete set of independent probabilities, in the sense that all other probabilities
can be computed from them:
P(Ri) i= 1,2,3
P(Li) j= 1,2,3
P(Ri, Li) i,j= 1,2,3
Here p(Ri) is the overall probability that a right-moving photon encountering the
ith filter will pass through it, and p(Ri,L1) is the probability that a pair of photons
encountering filters i and j (filter i on the right and filter j on the left) will both
pass through. The number of independent probabilities is thus 3+3+9 = 15, which
is precisely the number we needed to determine the density matrix. Thus our naive
scheme appears promising.
However, it is not enough to check that one has the right number of logically
independent probabilities. Two probabilities that are logically independent might
still be related by some restriction within quantum mechanics itself, in which case
one of them is actually redundant and does not contribute to determining the
density matrix. To make sure we have enough information to determine the density
Local Accessibility of Quantum States 43
matrix, we need to go through the kind of argument we used in the case of single
photons. Let Pi, i = 1,2,3, be the projectors onto the states selected by the three
kinds of filter, just as before. Then the 15 probabilities listed above are related to
the density matrix by the following equations:
DISCUSSION
The above statement actually contains two noteworthy facts.
The first is that measurements on the parts are sufficient for determining the
state of the whole. This is not a trivial result. Indeed, the conclusion would not hold
if the world were run according to real-vector-space quantum mechanics rather than
complex quantum mechanics. To see this, consider a composite system consisting
of two parts, each having two orthogonal states. In real-vector-space quantum me-
chanics, we can think of this system as a pair of photons, where each photon is
allowed to have only linear, and not elliptical polarization. Such a restriction is the
result of allowing only real amplitudes. Let p be any density matrix for the compos-
ite system, that is, any 4 x 4 real symmetric matrix with unit trace and non-negative
eigenvalues. Then consider any other density matrix of the form = ory ),
where b is a real number and ay is the Pauli matrix (1? —0'). (For some A matrices,
every non-zero b will cause one of the eigenvalues of A' to be negative and is there-
fore not allowed. However, for a typical p there will be a range of allowed b's. It is
the latter case that we consider here.) I show now that the value of b cannot be
determined by any set of measurements performed on the subsystems: The prob-
abilities obtained from such measurements will .always be related to the density
matrix through an equation of the form p = tr[(P 0 Q)p^1, where P and Q are pro-
jectors on the two-dimensional spaces of the individual photons. It turns out that
trKP Q)(ay 0 o.y)] is always zero, so that these probabilities will never depend
on b, and therefore the value of b cannot be determined by such measurements.
Local Accessibility of Quantum States 45
The two cases mentioned above, namely, quantum mechanics with g(N) = N2 — 1
and classical probability theory with g(N) = N — 1 both satisfy this condition.
So does any hypothetical theory with g(N) = Nk — 1, where k is a non-negative
integer. Thus quantum mechanics is not unique in this respect, but its g(N) belongs
to a rather special class of functions.
Let me finish with a story. A man came across a beam of photons and decided
to measure their polarization. He made only those measurements that he needed to
make, but so as not to waste any information, he also recorded which photon gave
which result. (He identified them by their times of arrival.) Somewhere far away, a
woman came across a similar beam and performed the same procedure. Later, the
two observers met and were told by a third person that the photons which they had
observed were actually produced in pairs at a common source. On looking back at
their records, they discovered that they possessed precisely the information they
needed for reconstructing the polarization state of the photon pairs. They were
pleased, of course, but they also wondered what the meaning of this good fortune
might be.
46 William K. Wootters
ACKNOWLEDGMENTS
I would like to thank Ted Jacobson and Ben Schumacher for a number of ideas
that have found their way into the paper. I would also like to thank the two groups
in Los Alamos' Theoretical Division that have contributed to the support of this
work: Complex Systems T-13 and Theoretical Astrophysics T-6.
REFERENCES
1, Bell, J. S. "On the Einstein Podolsky Rosen Paradox." Physics 1 (1964):195.
2. Freedman, S. J., and J. F. Clauser. "Experimental Test of Local Hidden-
Variable Theories." Phys. Rev. Lett. 28 (1972):938.
V. F. Mukhanov
Santa Fe Institute, Santa Fe, NM, U.S.A.; permanent address: Institute for Nuclear
Research, Moscow 117312, U.S.S.R.
W1 + W2 =M . (1)
(We will use the units in which c= h=G=k= 1.) Another possible way to
form a black hole with the same mass uses three radiation quanta with frequencies
M and 1474 + W2 + W3 = M), etc. If there are no restrictions on the
quanta frequencies, then the number of possible ways to form a black hole with
n-1
I
•%
4.5•
•
• .0
•
/
•••
2 •
/
• I /
1
3=1+1+1
3=1+2
— 3=2+1
3=3
3
/
•••0
2
.10
1 •••
1-(3)=4=2"
number "two," etc., up to level n. Another possible way to create a black hole with
quantum number n is to do it without intermediate transitions. The different ways
to form a black hole at the level n are depicted in Figure 1. There is a one-to-one
correspondence between the ways to form a black hole at level n and the subdivisions
of integer number n into ordered sums of integer numbers. For particular case n = 3
this correspondence is explained in Figure 2. Thus, the number of possible ways to
create a black hole at level n from the given matter, and consequently the number
of different internal configurations of this black hole, is equal to the number of
ordered subdivisions of integer number n. It is very easy to verify that this number
F(n) is equal to
F(n) = 2n -1 . (2)
Then the entropy of a black hole with quantum number n is
The constant in Eq. (4) was chosen such that there is no black hole when n = 0.
Thus, we found that the area of quantized black hole is proportional to the integer
number of the Planck areas. The minimal possible increase of the black hole area
is AAmin = 41n 2 in full correspondence with Bekenstein's result.2
If the black hole is quantized, then it is rather natural to consider the Hawking
radiation as a result of spontaneous quantum jumps of this black hole from one level
to other ones. It is natural also to have the maximum probability for the transition
to the nearest level (n n — 1). As a result of such transition, the black hole emits
the quantum of some physical field with energy wn,n...1 = M(n) — M(n — 1), charge
q = cl,(n) — (I)(n — 1), and angular momentum t = 1.2(n) —11(n — 1). Using the first
law of the black hole thermodynamics
1
dM = —Tb • dA + 4>clQ + SldT , (6)
4 • h•
and taking into account the quantization rule (5) after substitutions
(the equality dF Pe. F(n — 1) — F(n) is true for n >> 1), we find that for large n
(n >> 1) the parameters of this quantum satisfy the condition
FIGURE 3 The different levels for the nonrotating black hole without charge (0 =
= 0). In this case, An cc Mn oc n and Mn cc Viz. Each level has finite width
Wm = 76.Ma.n-I. If 7 << 1, then the hypothesis about different levels is justified.
the effective action. In the first approximation, the imaginary part is proportional
to squared curvature invariants. Thus, for the width of the level n we have
where Cikim is the Weyl tensor, AMn,n-i is the distance between the levels n and
n - 1, and the coefficient y characterizes the relative width of the levels (if y < 1,
then Wn < AMn,n_1). See also Figure 3.
The lifetime of the black hole at the level n is
1
Tr, 0•0 . (9)
Wn
Then it is easy to estimate the mass loss of the black hole because of its evaporation
dM 2.871.h. 2.8In 2 1
Tn — 0702 m2 (10)
dt '""
Comparing this formula with the corresponding Hawking formulae we find that for
the massless scalar field -ysc. f = 1/30, increasing the coefficient -y because of the
other fields may not be so significant.5 Therefore, the hypothesis about black hole
levels is justified (at least for sufficiently large black holes which emit only massless
quanta).
It is worth noting one of the most interesting consequences of black hole quanti-
zation. The black hole cannot absorb the radiation quantum whose length is larger
than the black hole size, because of the finite distance between nearest levels.
52 V. F. Mukhanov
ACKNOWLEDGMENTS
I would like to thank W. Unruh and W. Zurek for useful conversations. This research
was supported in part by the Santa Fe Institute. I am also very grateful to Ronda
Butler-Villa for her technical assistance in preparing this document for publication.
REFERENCES
1. Bekenstein, J. D. Phys. Rev. D7 (1973):2333.
2. Bekenstein, J. D. Phys. Today 33(1) (1980):24.
3. Hawking, S. W. Nature 248 (1974):30.
4. Hawking, S. W. Comm. Math. Phys. 45 (1975):9.
5. Page, D. N. Phys. Rev. D13 (1976):198.
6. Zurek, W. H., and K. S. Thorne. Phys. Rev. Lett. 54(20) (1985):2171.
Shin Takagi
Department of Physics, Tohoku University, Sendai 980, Japan
One cannot see beyond the horizon, but on earth, one can still communicate
across it. When it comes to a spacetime horizon, such a communication is im-
possible and one necessarily loses information on that part of the spacetime that
is beyond the horizon. Under such circumstances, some unexpected consequences
could arise. The most striking consequence is discussed in the theory developed
by Wheeler,25 Bekenstein,1'2 Hawking,12 and many others" that a black hole is a
thermodynamic object, as discussed by other presenters at this workshop. A closely
related situation occurs when an atom is uniformly accelerated. The purpose of my
chapter is to briefly sketch this remarkable theoretical development that emerged
from the works of Fulling," Davies,' UnruhUnruhW.G.,22 and others,4,8,16 and
discuss its relationship with apparently unconnected subjects, thus elucidating a
modest but perhaps hitherto unsuspected network among some of the well-known
ideas in theoretical physics.
UNIFORM ACCELERATION
To begin, what is a uniformly accelerated observer? At each instant, a special frame
of reference (1,7, V, 7) can be chosen so that the observer is momentarily at rest with
respect to it. Suppose he moves according to the Galilean law of a falling body:
If this is the case at every instant, the observer is said to be uniformly accelerated,
with acceleration g. Described in the global frame of reference (t, x, y, z), his world
line is a hyperbola13,16
)2
x2 (c.)2 =
(2)
where c is the velocity of light. No signal beyond the null ray x = ct can reach him;
this ray acts as a spacetime horizon.
Here 1/w is a kinematical factor, and 4/2 comes from the density of states. This
function satisfies the detailed-balance relation (or Kubo-Martin-Schwinger condi-
tion)
F(w) = e —"/T F(-42), (5)
Consequences of Loss of Information in Spacetime with Horizon 55
EPR CORRELATION
First, the phenomenon is a result of a kind of Einstein-Podolsky-Rosen correlation,1°
as has been noted by Unruh and Wald23 and others.21 As reformulated by Bohm,5
Bell,3 and others, EPR correlation manifests itself in a pure spin-state of a two-
particle system with each particle of spin 1/2. To define such a pure state, one
needs information on both of the particles. If, however, one looks at one of the
particles alone, its spin can be described equally well by an ensemble. Nevertheless,
the correlation between the spins of the two particles can be detected by measuring
the spins of these particles independently, even if the measurements occur at places
with space-like separation. Coming back to the present problem, the vacuum state
of the quantum field corresponds to a pure spin-state of the two-particle system.
To the spin of one of the particles correspond those degrees of freedom of the field
in the region z >1 t 1 (to be called region I), and to the spin of the other particle
correspond those degrees of freedom of the field in the region z < — 1 t 1 (to be
called region II). The definition of the vacuum requires information on the field
in both regions I and II. However, the uniformly accelerated observer "sees" only
those degrees of freedom in region I. Therefore, results of his measurements can be
described by an ensemble of states. Since regions I and II are space-like apart from
each other, this is a typical case of the underlying EPR correlation.
1 ewir + 1 ,
F(w) = 4.7 (6)
where T is the same as in Eq. (3). Note that the density of states in the n-
dimensional Minkowski spacetime is proportional to con-1, which explains the nu-
merator. But in contrast to Eq. (4), the distribution function here is that of Fermi-
Dirac, although we are dealing with photons (i.e., bosons) in both cases. This result
has been confirmed by Unruh,24 and also found independently by Stephens.18 How
can one make sense of this apparently paradoxical result?
HUYGENS' PRINCIPLE
It is well known that Huygens' principle is valid only in even-dimensional space-
times.6 In a sense, this fact also relates to a loss of information, because an in-
habitant of the three-dimensional spacetime sees only a shadow (or projection) of
the four-dimensional spacetime in which it can be embedded. With some technical
preparation, such as the KMS condition and the fluctuation-dissipation relation,
one can show that this circumstance is closely related to the present problem of
the apparent inversion of statistics.14,21 But here I shall point out yet another
unsuspected connection.15
1 D(w)
F (w) — w ewn, (7)
sense. But our temperature is related to the position e of the uniformly accelerated
observer at t = 0 as
1
T= . (9)
27r4"
(e = 1/g; see Eq. (2)). If we consider a family of uniformly accelerated observers
with various accelerations, we can associate a different temperature to each observer
according to Eq. (9). The world lines of this family of observers cover the entire
region I. Indeed, introducing the coordinates (77,e) instead of (t, x), where en is
the proper time of the observer whose position at t = 0 is e, one can convert the
spacetime metric
ds2 = dx2 4. 42 (10)
to the form
ds2 = ± ± dy2
In view of these considerations, one may be inclined to regard Eq. (8) as a "local
density of states" at e;
D(w) = w tanh(rwe) . (12)
Unfortunately, the spatial inhomogeneity of region I, as manifest in Eq. (11), pre-
vents one from defining the density of states per unit volume or a unique local
density of states.
In spite of its appearance, the spatial section of this metric is homogeneous. Indeed,
an appropriate coordinate transformation from (e, y) to (x, <p) gives
de2 dy2
42 — dx2 sinh2 xdW2 (14)
which is the metric of the two-hyperboloid. Eq. (13) thus describes the three-
dimensional open Einstein universe. Furthermore, an analytic continuation x
transforms Eq. (14) to
de2 sin2 042 (15)
that is, the metric of the two-sphere.
58 Shin Takagi
where v is the frequency with respect to the "time" q. Since the frequency w refers
to the proper time it is related to v by v = 4. Hence we recover Eqs. (6) and
(12). A corresponding calculation in the four-dimensional spacetime gives simply
D(v) = v2 (17)
and leads to the result (4). In an earlier publication,15 we have noted a connection
between Eq. (6) and the statistical mechanics of a dumbbell. Here I claim that the
former can be derived from the latter by sharpening the logic of Ref. 15.
CONCLUDING REMARKS
The last section was rather technical, but the essential point is that we were led
to consider the open Einstein universe because we gave up information beyond the
horizon and restricted ourselves to spacetime region I, i.e., the Rindler wedge. One
might hope that there may still be interesting connections yet to be discovered with
other ideas.
ACKNOWLEDGMENTS
I would like to thank the Yamada Science Promotion Foundation for a travel grant
and to the Santa Fe Institute for financial support covering local expenses, which
enabled me to attend this workshop. I also express my sincere thanks for the hos-
pitality of the Santa Fe Institute and Wojciech Zurek.
Consequences of Loss of Information in Spacetime with Horizon 59
REFERENCES
1. Bekenstein, J. D. "Black Holes and Entropy." Phys. Rev. D 7 (1973):2333-
2346.
2. Bekenstein, J. D. "Black-Hole Thermodynamics." Physics Today January
(1980):24-31.
3. Bell, J. S. "On the Einstein-Podolsky-Rosen Paradox." Physics 1 (1964):195-
200.
4. Birrell, N. D., and P. C. W. Davies. Quantum Fields in Curved Space. Cam-
bridge: Cambridge University Press, 1982.
5. Bohm, D. Quantum Theory. Englewood Cliffs, NJ:Prentice-Hall, 1951.
6. Courant, R., and D. Hilbert. Methods of Mathematical Physics, vol. II. New
York: Interscience Publishers, 1962.
7. Davies, P. C. W. "Scalar particle Production in Schwarzschild and Rindler
Metrics." J. Phys. A8 (1975):609-616.
8. DeWitt, B. S. "Quantum Field Theory in Curved Space." Physics Rep. 19
(1975):295-357.
9. DeWitt, B. S. "Quantum Gravity: The New Synthesis." In General Relativity,
edited by S. W. Hawking and W. Israel. Cambridge: Cambridge University
Press, 1979.
10. Einstein, A., B. Podolsky, and N. Rosen. "Can Quantum-Mechanical Descrip-
tion of Physical Reality be Considered Complete?" Phys. Rev. 47 (1935):777-
780.
11. Fulling, S. A. "Nonuniqueness of Canonical Field Quantization in Rieman-
nian Space-Time." Phys. Rev. D 7 (1973):2850-2862.
12. Hawking, S. W. "Particle Creation by Black Holes." Commun. Math. Phys.
43 (1975):199-220.
13. Misner, C. W., K. S. Thorne, and J. A. Wheeler. Gravitation. San Francisco:
Freeman, 1973.
14. Ooguri, II. "Spectrum of Hawking Radiation and the Huygens Principle."
Phys. Rev. D 33 (1986):3573-3580.
15. Ottewill, A., and S. Takagi. "Particle Detector Response for Thermal States
in Static Space-Times." Prog. Theor. Phys. 77 (1987):310-321.
16. Rindler, W. Essential Relativity, 2nd edition. New York:Springer-Verlag,
1977.
17. Sciama, W., P. Candelas, and D. Deutsch. "Quantum Field Theory, Horizons,
and Thermodynamics." Adv. in Phys. 30 (1981):327-366.
18. Stephens, C. R. Private communication.
19. Takagi, S. "On the Response of a Rindler Particle Detector." Prog. Theor.
Phys. 72 (1984):505-512.
20. Takagi, S. "On the Response of a Rindler Particle Detector II." Prog. Theor.
Phys. 74 (1985):142-151.
21. Takagi, S. "Vacuum Noise and Stress Induced by Uniform Acceleration."
Prog. Theor. Phys. Suppl. 88 (1986):1-142.
60 Shin Takagi
has generated the richness and variety of the present state occurred after the big
bang.L11
The seemingly unidirectional advance of complex organization, or depth, im-
poses on the universe an arrow of time, which is related to, but distinct from, that
due to the second law of thermodynamics. Some people have perceived an element
of paradox in the growth of organization in a universe in which entropy always
rises. True, the former arrow does challenge the spirit of the second law, which
predicts continual degeneration. But there is no conflict with the letter of the law.
Self-organization costs entropy. But whereas entropy is a measure of information
loss, organization (or depth) refers instead to the quality of information. Entropy
and depth are not each other's negatives.
Among the more interesting complex organized systems to have arisen thus
is the human brain. Containing as it does an internal representation of the phys-
ical world, the brain stands in an unusual relationship with the world. And here
the conjunction of simplicity and complexity is inverted: the brain is incredibly
complex, but the mental states that it supports make the world seem deceptively
simple. We are able to function as human beings because our mental model of the
world bestows upon it a coherent unity. When we talk about "understanding" some
aspect of nature, we mean slotting the phenomena associated therewith into our
existing mental model of "how things are out there."
Is this process of understanding a surprise? Does it tell us anything significant
about the structure of the brain, or the world, or both? Many people have puzzled
about such issues. Why is the universe knowable? After all, given the enormous
complexity and interconnectedness of the physical world, how can we know anything
without knowing everything? Indeed, how can we know anything at all?
As a starting point in addressing these tough questions, let us agree at least on
the following statements:
n There exists a real external world which contains certain regularities. These
regularities can be understood, at least in part, by a process of rational enquiry
called the scientific method.
n Science is not merely a game or charade. Its results capture, however imper-
fectly, some aspect of reality. Thus these regularities are real properties of the
physical universe and not just human inventions or delusions.
In making these assumptions one has to eschew extreme idealistic philosophies,
such as those in which the mind somehow imposes the regularities on the world in
order to make sense of it. Unless one accepts that the regularities are in some sense
objectively real, one might as well stop doing science.
As science progresses, so some regularities become systematized as laws, or
deductions from them. At this epoch the laws found in our textbooks image only
imperfectly the actual regularities. Two points of view can be detected among
practicing scientists regarding the ontological status of these laws. The first is that
there exist "real" laws, or "the correct set" of laws, to which our current theories
"theory of everything."8 Then it will be the case that a very limited period of
mathematical development (300 or 3000 years, depending on where you start) will
have proved sufficient to encapsulate the ultimate laws of the cosmos. But this raises
the curious question of why such a glittering prize, so sweeping in its explanatory
power, demands a nontrivial, yet so astonishingly limited, amount of mathematics.
One can imagine a world in which the principles are transparent to us all at a
glance, or another world in which the principles are impenetrably complicated and
subtle. Given the limitless amount of mathematics which could (and maybe will) be
developed in the (possibly infinite) future, isn't it remarkable that one could have
all of fundamental physics wrapped up with so modest a mathematical investment?
Given that the world does require some subtle and sophisticated mathematics to
describe it, why is it (relatively!) so easy for us to achieve this unifying description?
There is another aspect to this point. Again, assuming a "theory of everything"
is within our grasp, why is it that the requisite mathematics is achievable by the
(severely limited) human brain using an education span that is less than a typical
human life span? I confess I find this exceedingly odd. The learning capabilities of
the brain, and the length of the human life span, are both dictated by Darwinian
criteria, and (presumably) have no connection whatever with the mathematical
form of the fundamental laws of the cosmos.
It is often said that, because the brain is a physical system (i.e., part of the
physical world), it is no surprise that it reflects so efficiently the workings of that
world, i.e., that it generates just that mathematics which express the very laws
of physics that govern its own activity. I consider this to be an entirely erroneous
argument, based on a confusion of conceptual levels (a muddle between hardware
and software). As I have discussed this in detail elsewhere,9 I shall here restrict
myself to a new development that has a bearing on this issue, namely, the question
of computability in physical law.
Most mathematicians subscribe to the so-called Church-Turing hypothesis,
which is to say that a Turing machine, or universal computer, can perform any
computable mathematical operation. In other words, if a mathematical problem is
solvable, a Turing machine can solve it (so long as there is no restriction on the
available memory storage space). This is usually regarded as telling us something
about the foundations of mathematics or logic, but as David Deutsch has pointed
out, it also tells us something about the physical world.° To perform its modest
repertoire of operations, a Turing machine must employ the laws of mechanics. If
the laws of the physical universe were very different, then some operations that
are computable in our universe might no longer be. Conversely, certain operations
which are non-computable in our universe might be computable in a hypothetical
universe with different laws.
Deutsch expresses it thus":
topologies. Recently, Penrose23 has suggested that the human brain has capabilities
over and above those of a Turing machine, because humans are able to discover the
existence of true mathematical statements that no Turing machine can prove. He
claims that this ability can be traced to the influence of quantum mechanics on
brain processes. (A conventional Turing machine is a classical system.) If either of
these conjectures is correct, it would add a subtle new twist to the question of why
the universe is comprehensible to us.
There is a further tacit assumption running through all these arguments, which
is that the laws of physics are timeless eternal truths. But the intimate relationship
between physics and computation which is emerging from such studies challenges
that assumption. If nature can be viewed as a computational process ("the universe
is a computer" according to Fredkin13), then the form of the physical laws might
be constrained by what can be in principle computed. This point has been made
by Landauer.21 One then has to address the question of the computational limits
of the cosmos. If something cannot be computed by the entire universe during the
age of the universe, in what sense can it be said to be computable?
Might this imply that the laws of physics somehow "fade away" as one goes
back towards the initial singularity, on account of the fact that the computational
power of the universe tends to zero as t 0? Such a possibility has been suggested
by Lloyd and Pagels.M If so, then the laws of physics, along with the state of the
universe, would evolve with time. The laws would somehow emerge from the big
bang, and gradually "congeal" into their timeless form. Such a speculation is not
new, of course; one of its more eloquent proponents is John Wheeler.[41
To make this more concrete, I should like to point out that one may obtain
a natural measure of the information capacity of the cosmos using the Hawking-
Bekenstein formula for black hole entropy 4'19 If the entire universe were converted
into a black hole, it would conceal a quantity of information L given by
GM3
Iu ~
hc
where Mu is the mass of the observable universe (i.e., within the particle horizon).
At the current epoch t., 10120. At epoch t
REFERENCES
1. Barbour, J. B. "Maximal Variety as a New Fundamental Principle of Dynam-
ics." Found. Phys. 19 (1989):1051.
2. Barrow, J. D., and F. J. Tipler. The Anthropic Cosmological Principle. Ox-
ford: Oxford University Press, 1986, section 10.6.
3. Barrow, J. D. The World Within the World. Oxford: Oxford University Press,
1988, 292.
4. Bekenstein, J. D. "Black Holes and Entropy." Phys. Rev. D 7 (1973):2333.
5. Bennett, C. H. "On the Nature and Origin of Complexity in Discrete, Homo-
geneous, Locally Interacting Systems." Found. Phys. 16 (1986):585.
6. Chaitin, G. J. Algorithmic Information Theory. Cambridge: Cambridge Uni-
versity Press, 1987.
7. Davies, P. C. W. The Cosmic Blueprint. London/New York: Heinemann/
Simon & Schuster, 1988.
8. Davies, P. C. W., and J. R. Brown, eds. Superstrings: A Theory of Every-
thing? Cambridge: Cambridge University Press, 1988.
9. Davies, P. C. W. "Why is the Universe Knowable?" Science and Mathemat-
ics, edited by R. Mickens. Singapore: World Scientific, 1990.
10. Deutsch, D. "Quantum Theory, the Church-Turing Principle and the Univer-
sal Quantum Computer." Proc. Royal Soc. Lond. A 400 (1985):97.
11. Feynman, R. P. The Character of Physical Law. London: BBC Publications,
1965, 172.
12. Ford, J. "What is Chaos, that We Should Be Mindful of It." The New
Physics, edited by P. C. W. Davies. Cambridge: Cambridge University Press,
1989, 348.
13. Fredkin, E. This volume.
14. Geroch, R., and J. B. Hartle. "Computability and Physical Theories." Found.
Phys. 16 (1986).
15. Halliwell, J.J. "Information Dissipation in Quantum Cosmology and the
Emergence of Classical Spacetime," this volume, and the reviews cited
therein.
16. Hamming, R. W. "The Unreasonable Effectiveness of Mathematics." Amer.
Math. Monthly 87 (1980):81.
17. Hartle, J. B., and S. W. Hawking. "The Wave Function of the Universe."
Phys. Rev. D 28 (1983):2960.
18. Hartle, J. B. "Excess Baggage." Talk given at the 60th Birthday Celebration
for Murray Gell-Mann, Pasadena, Jan. 27, 1989.
19. Hawking, S.W. "Particle Creation by Black Holes." Commun. Math. Phys.
43 (1975):199.
20. Hawking, S. W. "Is the End in Sight for Theoretical Physics?" Inaugural Lec-
ture for the Lucasian Chair, University of Cambridge, 1979.
21. Landauer, R. "Wanted: A Physically Possible Theory of Physics." IEEE
Spectrum 4 (1967):105.
70 P. C. W. Davies
1. INTRODUCTION
Algorithmic information content (also known as algorithmic randomness) of a phys-
ical entity is given by the size, in bits, of the most concise message (e.g., of the
shortest program for a universal computer) which describes that entity with the
requisite accuracy. Regular systems can be specified by means of concise descrip-
tions. Therefore, algorithmic information content can be regarded as a measure of
disorder.
Algorithmic randomness is defined without a recourse to probabilities. It pro-
vides an alternative to the usual ensemble measures of disorder: it quantifies ran-
domness of the known features of the state of the physical system. I shall demon-
strate that it is indispensable in formulating thermodynamics from the viewpoint
of the information gathering and using system ("IGUS")—a Maxwell's demon-like
entity capable of performing measurements and of modifying its strategies (for
example, for extraction of useful work) on the basis of the outcomes of the mea-
surements. Such an IGUS can be regarded as a "complex adaptive system." The
aim of this paper is to review the concept of the algorithmic information content
in the context of statistical mechanics and discuss its recently discovered physical
applications.
2. OVERVIEW
Algorithmic randomness, an alternative measure of the information capacity of a
specific physical or mathematical object, was independently introduced in the mid-
60's by Solomonoff,22 Kolmogorov,13 and Chaitin.5 It is based on an intuitively
appealing idea that the information content is equal to the size, in bits, of the
shortest description. Formalization of this idea will be briefly described in the next
section: In its developement, it draws on the theory of algorithms and in the process
makes use of the theory of computation,7,19 establishes a firm and useful connection
with Shannon's theory of information,12,20 and benefits from its implications for
coding."
Applications of the algorithmic measures of the information content were ini-
tially mostly mathematical in nature. More recently, Bennett, in an influential
paper,1 has pointed out that the average algorithmic entropy of a thermodynamic
ensemble has the same value as its statistical (ensemble) entropy and, consequently,
one could attempt to build a consistent thermodynamics on an algorithmic foun-
dation.
I have applied algorithmic randomness to the problem of measurement as seen
by an observer 25,26,27
Following a measurement, an observer (IGUS) is in possession of a specific
record. From its intrinsic point of view, this record is quite definite. Therefore, fur-
ther analysis of the measurement in terms of the ensemble language is pointless.
Algorithmic Information Content, Church-Turing Thesis, and Physical Entropy 75
Rather, the observer must deal with the specific measurement outcomes and with
their implications for extraction of useful work. In this "Maxwell's demon" context,
algorithmic randomness assumes, in part, a function analogous to the Boltzmann-
Gibbs-Shannon entropy: From the observer's point of view, the second law of ther-
modynamics must be formulated by taking into account both the remaining igno-
rance (measurements are typically far from exhaustive) and the randomness in the
already available data. Thus, the physical entropy, the quantity which allows for the
formulation of thermodynamics from the viewpoint of the observer, must consist of
two contributions:
is its BGS entropy, and K(p) is the size of the shortest program capable of describing
P:
K (P) Psp • (2.3)
This recent proposal for physical entropy will be described in more detail in sections
3 and 4.
In section 4, I shall also discuss the importance of "compression" of the acquired
data to their most concise form: thermodynamic efficiency of an IGUS-operated en-
gine depends on its ability to find concise measurement descriptions. In turn, this
efficiency can be regarded as a consequence of the IGUS's ability to "understand"
or "model" the part of the Universe employed as the "engine" in terms of the regu-
larities which can be regarded as analogs of physical laws. In this sense, "intellectual
capabilities" of an IGUS are quite critical for its "success."
01010101010101010101 (3.1)
76 W. H. Zurek
and
10110100101100010111. (3.2)
The first system is "regular": It can be simply and concisely described as 10 "Ors."
There is no equally concise description of the second spin system. To reconstruct
it, one would have to have a "verbatim" description (Eq. 3.2); there is no way to
"compactify" this description into a more concise message.
The concept of algorithmic information content (known also as the algorithmic
randomness, algorithmic complexity, or algorithmic entropy) captures this intuitive
difference between the "regular" and "random" binary sequences.
Algorithmic information content of a binary sequence s is defined as the size of
the minimal program, sip which computes sequence s on the universal computer
U:
Ku(s) = I Su I • (3.3)
Above, vertical lines indicate the size of the binary string in bits. It is important to
note that this definition of the algorithmic information content makes it explicitly
subjective in the sense that it is computer dependent; hence, the subscript U in
Eq. (3.3). However, by the very definition of the universal computer, programs
executable on U can be also executed (and will yield the same output) on any other
universal computer U, provided that they are preceded by a prefix Tut', which
depends on U and U, but not on the program. Hence, algorithmic randomness
defined for two different computers will differ by at most the size of the prefix111:
Ill Note that here we have used the vertical lines in two different ways: on the left-hand side they
stand for the absolute value while on the right-hand they indicate the size of the binary string in
bits. Only this second meaning will be employed below.
Algorithmic Information Content, Church-Turing Thesis, and Physical Entropy 77
/
/
/ I //
-- /o///0/7/q///o/7/o/7,/o/7/16/
FIGURE 1 Turing machine T uses a set of instructions residing inside its "central
unit" as well as the input it reads in by means of the "head" scanning the input tape
to modify the content of the tape. A Universal Turing machine U can simulate any
other Turing machine by reading on the input tape the "description" of T. In particular,
a single-tape U can simulate operations of the modern computers, which can be
modelled as "polycephalic" Turing machines with access to several multidimensional
tapes and other kinds of hardware. Such "modern" machines (one of which is illustrated
above) may be more convenient to use, but their capabilities are limited to the
same range of "computable" tasks as for the original, one-tape U. This universality
justifies the importance attached to the universal computers. In particular, it limits the
subjectivity of the algorithmic information content defined by means of the minimal
program, Eq. (3.3).
Probability associated with the output o will be given by the sum over all inputs
which yields the given output o:
Dominant contribution in this sum will come from the shortest program. Hence,
This connection between the probability that a given string will be generated
by a random input and its algorithmic information content can be employed in
proving that the average algorithmic randomness of a member of a simply described
("thermodynamic") ensemble is almost identical to its Boltzmann-Gibbs-Shannon
entropy.1,4,26 The relevant double inequality
has been demonstrated by Levin16,17 and Chaitin5 (see, also, Bennett3 for a more
accessible presentation and Caves4 for a discussion in the physical context). Above,
K(si, {p(si)}) is the size of the minimal description of the ensemble. Benne& has
80 W. H. Zurek
pointed out that, in the case when the BGS entropy is large compared with the size
of the ensemble description:
H({p(si)}) = — p(si) log2 p(si) >> K(si, {p(si)}) , (3.15)
{3.}
one could base thermodynamic formalism on the average algorithmic randomness
of the ensemble.
In a recent paper I have considered an application of algorithmic randomness
to the situation in which an observer attempts to extract maximum useful work
from the system on the basis of partial measurements.26 In the next section I shall
discuss this situation which forces one to consider physical entropy defined as the
sum of the remaining ignorance H({p(si)}) and of the cost of storage of the available
information K(si, {P(si)))-
The last quantity which can be defined in the algorithmic context is the al-
gorithmic information distance given by the sum of the conditional information
contents:
A(s, t) = K(s I t) + K(t s) . (3.16)
Algorithmic information distance satisfies the requirements expected of a metric.25
In addition to the "simple" distance defined by Eq. (3.16), one can consider
several related quantities. For example,
K(s!t!u) = K(s I t, u) + K (t I u, s) + K (u s, t) (3.17)
is also positive, reflexive, and satisfies the obvious generalization of triangle inequal-
ity. Hence, K(s!t!u) and its further generalizations involving more strings can be
regarded as direct extensions of A(s,t) = K(s!t).
It is sometimes useful to express distance as the difference between the joint
and mutual information content
(s, t) = K(s, t) — K(s : , (3.18)
where the mutual information is given by
K(s : = K(s,t) — (K(s) + K(t)) . (3.19)
The quantity A' defined by Eq. (3.18) differs from the "original" distance in Eq.
(3.16) by logarithmic terms because of the similar logarithmic errors entering into
Eq. (3.10). The advantage of employing Eq. (3.18) is its intuitive appeal: The dis-
tance between two binary strings is the information which they contain but do not
share.
Mutual information can also be used to define algorithmic independence of two
strings: s and t are independent when K(s : t) is small; for example,
K(s : t) < min (K(s), K(t)) .
Information distance can also be defined for statistical (that is, BGS) entropy.
In this case, A and A' coincide. Indeed, information distance was independently
discovered in the domain of the Shannon's information theory by at least three
authors before it was discussed (again without the benefit of knowledge of these
references) by this author' in the algorithmic context.
Algorithmic Information Content, Church-Turing Thesis, and Physical Entropy 81
The connection between the mathematical model of an "intelligent being" and ther-
modynamics goes back to the above-mentioned paper by Szilard.26 In the analysis
of the famous one-gas-particle engine (Figure 3), Szilard concluded that the second
law could be indeed violated by a fairly simple "demon" unless the cost of mea-
surements is no less than kBT per bit of acquired information. Further, essential
clarification of the situation is due to the recent work by Bennett1'2 who, basing his
discussion on the earlier considerations of Landauer16,16 on the costs of information
erasure, concluded that it is the "resetting" of the measuring apparatus which is
thermodynamically expensive and must be responsible for restoring the validity of
the second law in Szilard's engine. (Indeed, this observation was anticipated, if only
in a somewhat half-hearted manner, by Szilard.26
Algorithmic randomness proved essential in attempts to generalize this discus-
sion of Maxwell's demon 4,28,29,3° The validity of the original experiment about the
cost of erasure was limited to the context of Szilard's engine. In that case, the out-
come of the measurement can always be described by a single bit. Hence, the gain
of useful work in the course of the expansion is given by
Note that above we are using Boltzmann constant kB which differs from the usual
one by a factor of In 2. This distinction reflects the difference between entropy
measured in "bits" and "nats."
This gain of useful work is "paid for" by AW— of the energy needed to restore
the memory part of the "brain" of the "demon" to the "blank," "ready to measure"
state
OW' = —kBT. (4.2)
minolmammor
0 •
• AI\ %%III
FIGURE 3 Szilard's engine employs one-molecule gas in contact with a heat bath
at temperature T to extract N T In 2 of work per cycle (which is illustrated in a self-
explanatory manner above). The measurement which establishes the location of the
molecule is crucial. The importance of the cost of erasure for the proper accounting for
the net energy gain is discussed in the text.
Algorithmic Information Content, Church•Turing Thesis, and Physical Entropy 83
It is, nevertheless, far from clear how to apply this "cost of erasure" argument to
less idealized and more realistic situations.
One simple (although not very realistic) generalization is to consider a sequence
of measurements on the Szilard's engine and to postpone the "erasure" indefinitely.
This requires a demon with a significant memory size. One can then, as noted by
Bennett,I,2 use Szilard's engine to extract kBT of work per cycle as long as there is
"empty" tape. This is, of course, only an apparent violation of the second law since
the empty tape can be regarded as a zero-entropy (and, hence, zero-temperature)
reservoir. Consequently, an ideally efficient engine can, in accord with the second
law and, in particular, with the Carnot efficiency formula, attain exactly kBT of
work per cycle.
The cost of erasure does not have to be paid for as long as the "memory tape" is
available. However, for this very reason, the process is not truly cyclic: the demon's
memory is never restored to the initial "blank" state. The gain of useful work is paid
for by the "clutter" in its "brain." If the outcomes of consecutive measurements
are random, getting rid of this clutter would cost kBT per bit, and all the apparent
gain of work would have to be "paid back" by the final costs of erasure.
Each measurement results in filling up n blanks of the tape with 0's and l's. Hence,
the cost of erasure would be
Again, one could postpone erasures indefinitely and just "dump" all of the "clut-
tered-up" tape into the garbage can. In the final count, however, the cost of erasure
(linear in n) would outweigh the gain of useful work (which is logarithmic in n).
84 W. H. Zurek
1 2 3 4 5 6 7 8 9
•
Tape Supply
Compressed Record
rn
Turing Machine
1 1 1
is a limit on the compressibility which relates the average size (AK) of the record
with the decreased statistical entropy AH of the measured system via an inequality
For, unless this inequality holds, the gain of useful work which is equal to
Hence, the net average gain of useful energy per cycle would be
The second law demands that (OW) < 0, which leads to the inequality (4.6).
Fortunately, this inequality is indeed respected: It is an immediate consequence
of the left-hand side of the inequality in Eq. (3.14). Indeed, it follows from the first
basic result of Shannon's theory of communication (the so-called noiseless channel
coding theorem; see Shannon and Weaver,20 Khintchin,12 Hamming,1° and Caves4
for discussion): The average size of minimal "descriptions" needed to unambigu-
ously describe measurement outcomes cannot be made smaller than the statisti-
cal entropy of the "source" of information (in our case, of the measured physical
system).noiseless channel coding theorem In this context, the second law can be
regarded as a direct consequence of the Kraft inequalityl° which plays a basic role
in the coding theory and thus enters physics25,26: Suppose that {K1} are sizes (in
the number of bits) of distinct symbols (programs) {si} which correspond to dif-
ferent signals (measurement outcomes). Then one can prove that in order for the
encoding to be uniquely decodable, the following inequality must be obeyed:
The inequality (4.6) follows from Kraft inequality (4.10) since it can be immediately
rewritten as
(pi(si)2-K•
log2 E
P(Si) )
where p(si) are the probabilities corresponding to signals (states) si. Now, employ-
ing convexity of the logarithm, one can write
FIGURE 5 The effect of measurements on (i) The Shannon entropy lid in presence
of the partial information—data d; (ii) The algorithmic information content of the data
K(d); and (iii) The physical entropy Sd F.-": Hd K(d), which measures the
net ammount of work that can be extracted from the system given the information
contained in the data d. (a) When the measurements are carried out on the equilibrium
ensemble, the randomness in the data increases at the rate given by the decrease of
ignorance. (b) For systems far from equilibrium the increase of randomness is smaller
than the decrease of ignorance, which allows the observer to extract useful work and
makes measurements energetically attractive.
ACKNOWLEDGMENTS
I would like to thank Charles Bennett, Carl Caves, Stirling Colgate, Doyne
Farmer, Murray Gell-Mann, James Hartle, Rolf Landauer, Seth Lloyd, Bill Unruh,
and John Wheeler for stimulating and enjoyable discussions on the subject of this
paper. The warm hospitality of the Aspen Center for Physics, the Institute for
Theoretical Physics in Santa Barbara, and the Santa Fe Institute is very much
appreciated.
Algorithmic Information Content, Church Turing Thesis, and Physical Entropy 89
REFERENCES
1. Bennett, C. H. Int. J. Theor. Phys. 21 (1982):305-340.
2. Bennett, C. H. Sci. Am. 255(11) (1987):108-117.
3. Bennett, C. H. In The Universal Turing Machine-A Half-Century Survey,
edited by R. Herkin. Oxford: Oxford University Press, 1988.
4. Caves, C. M. This volume.
5. Chaitin, G. J. J. ACM 13 (1966):547-569.
6. Chaitin, G. J. Sci. Am. 232(5) (1975):47-52.
7. Davis, M. Computability and Unsolvability. New York: Dover, 1973.
8. Feynman, R. P., R. B. Leighton, and M. Sands. Feynman Lectures on Physics,
sect. 46, vol. 1. Reading, MA: Addison-Wesley, 1964.
9. Godel, K. Monat. Nacht. Mat. Phys. 38 (1931):173-198.
10. Hamming, R. W. Coding and Information Theory. Englewood Cliffs: Prentice-
Hall, 1987.
11. Hofstadter, D. Godel, Escher, Bach. New York: Random House, 1979.
12. Khinchin, A. I. Information Theory. New York: Dover, 1957.
13. Kolmogorov, A. N. Information Transmission 1 (1965)3-11.
14. Landauer, R. IBM J. Res. Dev. 3 (1961):113-131.
15. Landauer, B.. In Signal Processing, edited by S. Haykin. New York: Prentice-
Hall, 1989,18-47.
16. Levin, L. A. Dokl. Akad Nauk SSSR 227 (1976).
17. Levin, L. A. Soy. Math Doklady 17 (1976):522-526.
18. Penrose, R. The Emperor's New Mind. Oxford: Oxford University Press,
1989.
19. Rogers, H. Theory of Recursive Functions and Effective Computability. New
York: McGraw-Hill, 1967.
20. Shannon, C. E., and W. Weaver. The Mathematical Theory of Communica-
tion. Urbana: Univ. of Illinois Press, 1949.
21. Smoluchowski, M. In Vortgige iber die Kinetische Theorie der Materie and
der Elektrizitit. Leipzig: Teubner, 1914.
22. Solomonoff, R. J. Info. 6 Control 7 (1964):1-22.
23. Szilard, L. Z. Phys. 53 (1929):840-856.
24. Turing, A. M. Proc. Lond. Math. Soc. 42 (1936):230-265.
25. Zurek, W. H. Nature 341 (1989):119-124.
26. Zurek, W. H. Phys. Rev. A 40 (1989):4731-4751.
27. Zurek, W. H. In Proceedings of the International Symposium on Quantum
Mechanics, edited by Y. Murayama. Tokyo: Physical Society of Japan, 1990.
Carlton M. Caves
Center for Laser Studies, University of Southern California, Los Angeles, California
90089-1112
Suppose the memory examines the system and discovers which state it is in. The
average information that the memory acquires is quantified by the Gibbs-Shannon
statistical information9' 22
STATISTICAL INFORMATION
Consider, however, a "system" that can occupy one of 3 states, labeled by an index
j, and let J be the set of values of j. Think of J as a property or quantity of the
system, which takes on the values j. A good example to keep in mind is a physical
system with a macroscopic number of degrees of freedom, f 1024 = 2"; the
states are system microstates—classically, phase-space cells; quantum mechanically,
perhaps energy eigenstates.
Suppose that a memory, based on prior information r, assigns probability
p(j17r) to find the system in state j. How much information is "stored" in the
system (or in the quantity J)? Evidently, the information stored is not a prop-
erty of the system alone; it must be defined relative to the information that the
94 Carlton M. Caves
memory already has. For example, if the memory assigns uniform probabilities
.7
p(jIw) = j-1, it regards the system as storing log bits (the amount of "space"
available in the system), but if the memory knows which state the system is in, it
regards the system as storing no information. Probabilities do play a role now.
Linguistic precision, therefore, requires dropping the system-specific term "in-
formation stored." The question should be rephrased as, "How much information
does the memory acquire when it finds the system in a particular state?" The an-
swer, on the average, is well known—it is the statistical information (1.1)—but a
brief review highlights what the answer means.
Consider, then, a Gibbs ensemble consisting of N systems, distributed among
states according to the probabilities p(j1 7). When N is very large, the only ensemble
configurations with nonvanishing probability are those for which the states j occur
with frequencies p(j) 7). Each such configuration has probability
statistical information H(.11 r) is the code-word length per system needed to code
an arbitrarily large Gibbs ensemble of systems. This way of thinking makes direct
contact with the primitive notion of "space" as the amount of information, but
it has disadvantages associated with the use of a Gibbs ensemble. Since the code
words are assigned to ensemble configurations, there is no way to consider individual
members of the ensemble. In particular, it is impossible to identify the information
that the memory acquires when it finds a member of the ensemble in a particular
state.
This difficulty can be circumvented by considering variable-length "instanta-
neous" coding.8 One assigns to each system state j a distinct binary code word,
whose length can vary from one state to another. A message in this code is a
sequence of code words, signifying a sequence of system states. The code words
make up an instantaneous or prefix-condition code if no code word is a prefix for
any other code word. (An instantaneous binary code can also be viewed profitably
as a dichotomic tree search 8,25) A message in an instantaneous code is uniquely
decodable with no requirement for end markers to separate successive code words.
Although instantaneous codes are not the only codes with this property, they are
special in the following sense: a message in an instantaneous code is uniquely de-
codable as soon as each code word is completed.
An immediate consequence of the convexity of the log function is that, no mat-
ter what the probability assignment p(ji r), the average length of an instantaneous
binary code is not less than the statistical information:8,25
Equation (2.4) provides a strict lower bound on the average word length, but it gives
no hint how closely this lower bound can be approached. There is an explicit pro-
cedure, due to Huffman,8,11,25 for constructing an optimal code—one with smallest
average length. Huffman's procedure is an optimal realization of the idea that one
should assign long code words to states with low probability and short code words to
states with high probability. More useful here is a non-optimal coding procedure,8'25
which Zurek27 calls Shannon-Fano coding, for which an upper bound on average
length can be established. In Shannon-Fano coding, one assigns to state j a code
word whose length is the smallest integer greater than or equal to — log p(j17); thus
the length satisfies
That such a code can be constructed follows from a condition known as the Kraft
inequality8.25: there exists an instantaneous binary code for a particular set of code-
word lengths if and only if those lengths satisfy E„ < 1. Shannon-Fano coding
satisfies the Kraft inequality as a consequence of the left inequality in Eq. (2.5).
96 Carlton M. Caves
Averaging the right inequality in Eq. (2.5), one finds that Shannon-Fano codes obey
the inequality
p(j1 7)4 < H(Ji 7) + 1 . (2.6)
Optimal codes, which cannot have greater average code-word length, satisfy the
same inequality, although the word lengths of an optimal code do not necessarily
satisfy Eq. (2.5).
Combining Eqs. (2.4) and (2.6) yields upper and lower bounds that relate av-
erage code-word length for optimal and Shannon-Fano codes to statistical informa-
tion:
H (JI r) 5_ E p(iir)ti < H(J1r) + 1. (2.7)
All instantaneous codes obey the lower bound, whereas the upper bound is a ques-
tion of existence: there exist codes, such as Huffman and Shannon-Fano codes,
whose average length lies within the upper bound. One may interpret the code-word
length ti as the amount of information that the memory acquires when it finds the
system in state j; given the bounds (2.7), the average information acquired by the
memory is close to the statistical information. Indeed, for a macroscopic physi-
cal system near thermodynamic equilibrium, the statistical information is roughly
H f 280, so the statistical information and the optimal average code-word
length are essentially identical.
This interpretation, though appealing, is not wholly satisfactory. Why such
emphasis on instantaneous codes? Why should the length £ be the relevant amount
of information, when the code word represents the state j only in the sense of
a somewhat arbitrary look-up table? Where is the necessary dependence of the
amount of information on the prior information? These questions can be answered
by framing the discussion in terms of algorithmic information theory, which deals
with the length of programs on a universal computer. Because the programs are
required to be "self-delimiting," they constitute an instantaneous code. The code-
word length ti becomes the length of a minimal program which, when added to
a minimal program for generating the probability assignment p( ji 7), causes the
universal computer to produce a complete description of the state j. This minimal
program length is the irreducible information content of the state j, relative to the
prior information, and has every right to be called the information that the memory
acquires when it finds the system in state j.
Before turning to algorithmic information, however, it is useful to list the prop-
erties of statistical information for two (or more) quantities.8 For that purpose,
consider two quantities, J and K, which take on values j and k. To describe the as-
signment of probabilities in this situation, one says that a memory, based on prior in-
formation r, assigns joint probability p(j,ki r) = p(jJ r)p(k1 j, 7) = p(kj ir)p(jj k, 7)
to find values j and k. One defines the conditional statistical information,
ALGORITHMIC INFORMATION
Algorithmic information theOry4'5'6'7'15'16'19'2°'23'" has been developed over the
last twenty-five years, in large part to make rigorous the notion of a random num-
ber. Here I give only the barest summary of the principal ideas. More extensive
summaries, aimed at the physics applications pursued here, can be found in the
two recent papers by Zurek 26,27
Algorithmic information theory deals with a universal computer—for example,
a universal Turing machine—which computes binary strings—sequences of 0's and
1's—and n-tuples of binary strings. I use the letters p, q, and r to denote binary
strings, and I let lql be the length of the string q. Suppose one settles on a particular
universal computer. A program p for this computer is a string such that, when
the computer is presented with p, it embarks on a computation that halts after
producing an output string. The program must halt, and as there is no provision
for program end markers, p must carry within itself information about when to
halt. Such programs, called self-delimiting, constitute an instantaneous code: no
program can be a prefix for any other program.
The absolute algorithmic information I(q) of a string q is the length of the
minimal program q*—program of shortest length—that produces q as its output:
I(q) E 1, q* the minimal program for q . (2.14)
98 Carlton M. Caves
Choosing a different universal computer can change the length of the minimal pro-
gram by an additive constant that depends on the choice of computer, but not on
the length of q. Thus, to within this constant, algorithmic information is precisely
defined and quantifies the irreducible information content of the string q. Reflect-
ing this computer dependence, equalities and inequalities in algorithmic information
theory are proven to within 0(1)—i.e., to within the computer-dependent additive
constant. Following Zurek,26 I use "physicist's notation"— , > —to denote
approximate equality and inequality to within 0(1).
Some strings are algorithmically simple; such a string can be generated by a
program much shorter than its own length. Most strings, however, are algorithmi-
cally random in the sense that the simplest way to generate them is to list the entire
string. Indeed, the absolute algorithmic information of a typical (random) string is
The leading term Iql is the number of bits needed to list the entire string; the
logarithmic term log IqI is the number of bits needed to specify the length IqI of the
string, information the program needs in order to be self-delimiting.
Extension of algorithmic information to n-tuples of strings is straightforward.
For example, the joint algorithmic information I(q, r) is the length of the minimal
program to generate string q followed by string r, the two strings separated by
some punctuation such as a comma. Though straightforward, the extension to n-
tuples reveals the importance of the restriction to self-delimiting programs. With
self-delimiting programs, it is easy to convince oneself that I(q, r) < I(q) + I (r),
because minimal programs for q and r can be concatenated.5,6,19,20 In contrast, if
the programs are not self-delimiting, the concatenated program needs to contain
information about where one program ends and the next begins; as a consequence,
the inequality holds only if one adds logarithmic corrections of order log I(q) +
log I(r) to the right side.6,6
The generalization to n-tuples allows one to define conditional algorithmic in-
formation. Suppose the computer is given the minimal program e for r as an
"oracle." One may then consider programs which, when added to r*, cause the
computer to calculate an output string; notice that these conditional programs,
being self-delimiting, form an instantaneous code. Now let qr,.. be the minimal pro-
gram that must be added to r* so that the computer produces q as its output. The
conditional algorithmic information 1(q1r) is the length of
It is crucial in this definition of /(qI r) that the computer be given the minimal
program r*, rather than r, as an oracle [equivalently, it could be given r and I(r)].5
With this definition, it is possible to show that NI r) <I (q); if the computer is
given only r instead, this inequality holds only if one adds logarithmic corrections
Entropy and Information 99
of order log I(r) to the right side.5 One can now define the mutual algorithmic
information
I(r;q) = I(q) — I (q1 r) > 0, (2.17)
which quantifies the extent to which knowledge of I.* allows the computer to use a
shorter program to compute q.
Relations among the various kinds of algorithmic information can be summa-
rized by
I(q) I(r1 q) = I(q,r) N I (r, q) = I(r) I(qlr) , (2.18)
I (r; q) = I (q) — I (q1r) = I (r) — gri q) = (q; r) . (2.19)
Aside from the 0(1) equalities, these relations are identical to those for statis-
tical information [Eqs. (2.12) and (2.13)], yet algorithmic information deals with
individual strings whereas statistical information is couched in terms of averages.
Return to the system introduced above and its states j. Since algorithmic informa-
tion is defined in terms of binary strings, associate with each state j a specifying
string ri , which completely describes the state. For a classical physical system, ri is
a list of phase-space coordinates (to some accuracy), which specifies a phase-space
cell; for a quantum-mechanical system, ri might be an energy eigenvalue that spec-
ifies an energy eigenstate. The absolute algorithmic information I(ri) of state j is
the length of the minimal program 71 to generate the specifying string ri .
The memory uses prior information w to enumerate system states and assign
them probabilities. To include this prior information within the language of algo-
rithmic information theory, imagine that the memory stores a program which, when
presented to the universal computer, causes the computer to list the states and their
probabilities. This sort of program has been considered by Zurek.27 To formalize
it, let p.r denote an n-tuple of strings that consists of the specifying strings r5 for
all states with non-zero probability and, associated with each string, its probability
p(jI 7). Since I want to use the full apparatus of algorithmic information theory—in
particular, to have self-delimiting programs for computing pji,r—I insist that the
100 Carlton M. Caves
number J of system states (with non-zero probability) be finite (unlike Zurek) and
that the probabilities be given to some finite accuracy. For a physical system in
thermodynamic equilibrium, this requirement can be accommodated by using the
microcanonical ensemble or by using the canonical ensemble with an algorithmically
simple maximum energy. The irreducible information content of the prior informa-
tion r is I(pJIx) = IpiIA I, the length of the shortest program p*jix to compute pji,
I call /(pii,r) the algorithmic prior information.
Two fundamental inequalities place upper and lower bounds on the average
absolute algorithmic information:
Bennett' pointed out the relevance of these bounds to the entropy of physical sys-
tems, and Zurek27 has given an extensive discussion of their importance in that
context. That the computer programs form an instantaneous code leads immedi-
ately to the left inequality in Eq. (3.1). The right inequality is related to inequal-
ity (2.6) for Shannon-Fano codes. I give here a precis of Zurek's proof 27 of the right
inequality, which in turn is based on the proof of Zvonkin and Levin.29
The minimal program pj̀p, alone generates the specifying strings ri and their
probabilities p(ji 7r). Suppose one adds to first, a sorting algorithm that sorts
the strings rj in order of decreasing probability (or in some other well-defined order
for equal probabilities), and, second, a coding algorithm that assigns to each string
rj a Shannon-Fano code word, whose length satisfies < r) + 1. The
sorting and coding algorithms are algorithmically simple; their overall length I„
can be included in the computer-dependent additive constant. One can now obtain
a program to generate a particular specifying string ri by adding to the above
program the code word for rj. Since the code word is part of an instantaneous code,
there is no need to supply the length of the code word. The result is a program for
computing rj whose length [to within 0(1)] is given by /(pj12.)-F isc /(p.717) —
log p(jiw), where the latter inequality follows from including I„ and the 1 from £3
in the computer-dependent additive constant. The minimal program for generating
rj being no longer than this one, one arrives at an inequality
whose average over p(jI ir) yields the right inequality in Eq. (3.1).
Important though Eq. (3.1) is, it is unsatisfactory because it is couched in
terms of absolute algorithmic information. More useful are inequalities involving the
conditional algorithmic information I(r3 I pjk.), which is the length of the minimal
program for ri when the computer is given R.71,,, as an oracle. If the memory stores
Ari I NO is the length of the additional program that the memory must store
to generate rj; thus I(ri I pik.) may be interpreted as the amount of information that
the memory acquires when it finds the system in state j, given the prior information
Entropy and Information 101
Writing the left side of this inequality as /(pj],r, rj) = I(ri)+ l(pjklrj) yields a
new upper bound for I(ri) in terms of mutual information,
1(r1) -5, — logp(j1 7) ± gni; pJO. (3.5)
Averaging this inequality over p(j11.) leads to an upper bound tighter than the one
in Eq. (3.1),
E PU1 7)1(ri) ••• HO 7) + E Ail ir)/(rJ;13,1,) (3.6)
i i
[tighter because gra; pii,) ,., /(pjl,r)]. If, instead, one writes the left side of Eq. (3.4)
as /(pj17, ri) = /(P/pr) + I(r) I pjfr), one obtains the inequality
I(ri I pit,,.) ..S. — log p(jj r)• (3.7)
Averaging this inequality over p(j1 7) and combining the resulting upper bound
with the lower bound (3.3) yields the desired double inequality for the average
conditional algorithmic information,
with the understanding that H(JI 7r) provides a strict lower bound on the average
conditional algorithmic information. Equation (3.9) translates to
There are, of course, algorithmically simple numbers with much smaller algorithmic
information, but they are few compared to the typical (random) numbers.
Suppose now that the memory assigns uniform probabilities p(j1n) = 3-1;
the statistical information has its maximum value H(J1w) = log J. The essential
information needed to enumerate J and assign uniform probabilities is the number
3, so the algorithmic prior information is
an estimate that accords with the double inequality (3.8). Eqs. (3.11) and (3.13)
combine to give an estimate for the mutual algorithmic information,
which is the length of a typical string rj—an interpretation that makes sense of a
final estimate
/(pji„, r5) = /(pjl,r ) — l(ri;pji,) 2..• log 3 . (3.15)
The estimate for algorithmic prior information in this first example is driven by
the assumption that 3 is a typical (random) number. As a second example, let J be
the set of binary strings of length N > 1; then 9 = 2N is algorithmically simple.
Suppose again that the memory assigns uniform probabilities, so that H (JI 7r) =
N = log Y. Proceeding as in the first example, one can estimate the various kinds
of algorithmic information for typical strings rj:
The key difference lies in the estimate for the algorithmic prior information ./(1:.
as in the first example, the essential information needed to enumerate J and assign
uniform probabilities is the number j = 2N, but here 3 can be specified by giving
the number N, which requires only log N bits (potential terms of order log log N are
ignored). In this example, it takes much less information to generate the probability
assignment than to specify a typical string. The uniform probabilities are concisely
describable, and the double inequality (3.1) is tight.
Suppose that, instead of uniform probabilities, the memory assigns probabilities
of ji corresponding to H(J1r) = 1, to each of two strings, qo and ql. For a case of
algorithmically simple strings—let qo be the string of N 0's and q1 be the string of
N 1's—the estimates for algorithmic information become
where a can be 0 or 1. In contrast, for two typical strings that are algorithmically
independent—i.e., I(qo; qi) t...• 0—the estimates become
N + log N, MN + log N
1(qc,; N + log N ,
1(q.I pii0 ce. log M, I(pji,r 1qa) tse (M — 1)N
(3.19)
where a = 1, , M. In these last three cases, I include only leading-order contri-
butions that can be reliably estimated. In the latter two cases, it takes more—in
the last case, much more—information to assign probabilities than to specify any
single string; as a consequence, the double inequality (3.1) becomes meaningless.
104 Carlton M. Caves
stores the system energy to n << log E binary digits, corresponding to energy res-
olution SE = 2-n+1E. The resulting number of "accessible" states, J = p(E)SE,
corresponds to statistical information
H(J17)= log J = log(p(E)SE) = S2(E)—n+ 1, (3.20)
where
S2(E) = log(p(E)E) =log (3.21)
is the microcanonical thermodynamic entropy in bits. For a macroscopic system
the entropy has typical size S2(E) f 2"; notice that the length of a typical
specifying string is S2(E1) bits. The statistical information H(J1r) is often called
the entropy10,21; for any remotely reasonable value of n, of course, it doesn't matter
whether S2(E) or HMI) is called entropy. Indeed, the success of statistical physics
depends on this indifference. The motivation for calling S2(E) the thermodynamic
entropy is that it doesn't depend on the subjective resolution SE.
To specify a typical microstate j, one gives the length of the specifying string
ri , which requires log log ti bits, and then gives the entire string rj, which requires
log E5 bits. Hence, the absolute algorithmic information of a typical microstate j is
1(10= log E1 + log log Ej = Sz(Ej ) log S2(E5 ) . (3.22)
The Hamiltonian and boundary conditions, which are algorithmically simple, can be
used to calculate the eigenvalues, but one should not conclude, as a consequence,
that the eigenvalues are algorithmically simple. Suppose a universal computer is
handed the Hamiltonian and begins calculating energy eigenvalues from the ground
state up. If the computer is to generate a particular eigenvalue Eh it needs suffi-
cient instructions to recognize that eigenvalue and halt. The algorithmic information
content of these instructions is typically log(number of eigenstates with eigenvalue
smaller than E1).--.1og(p(Ej)Ei) = S2(Ej), in agreement with Eq. (3.22). Zurek27
uses this argument to show how algorithmic information increases during the tem-
poral evolution of a classical ergodic Hamiltonian system.
The simplicity of the Hamiltonian and boundary conditions manifests itself in
generating simple classes of eigenvalues. Indeed, to assign the microcanonical prob-
abilities, the memory must store the Hamiltonian and boundary conditions (4 bits)
and the energy to n significant binary digits (n bits); in addition, it must know the
number of microstates to be included, J = p(E)SE =2H(ilf), which requires only
log H(J) r) bits, because the number of microstates is algorithmically simple. These
last log H(JI r) bits can also be viewed as specifying the energy range in natural
units or as giving the number of zeroes that follow the n significant digits in the bi-
nary representation of E. With this information a universal computer can calculate
the energy eigenvalues within the energy range 6E and assign them uniform prob-
abilities; i.e., it can calculate the n-tuple pjl, for the microcanonical probabilities.
Thus the algorithmic prior information for the microcanonical ensemble is
/o + n + log H(JI r) = Io + n + log S2(E). (3.23)
106 Carlton M. Caves
Each significant binary digit in E, beyond the first few, corresponds to a bit of
reduction in the statistical information (3.20). The term log S2(E) in Eq. (3.23)
has the appealing interpretation that it is the number of bits needed to say what
the entropy is.
The microcanonical probabilities are concisely describable—i.e., ./(p jiir ) <<
HM 7r). Indeed, any probability assignment for a macroscopic system must have
algorithmic prior information much smaller than the algorithmic information of a
typical microstate; otherwise, no practical memory could store the required algo-
rithmic prior information.
The conditional algorithmic information of a typical microstate j can be esti-
mated to be
i(ri I p.710 r:elog(p(E)5E) = H(J17) = S2(E) - n, (3.24)
an estimate that accords with the double inequality (3.8) and that leads to further
estimates,
n + log S2(E), 1(pjor lri) Io . (3.25)
Akli,(ki
70P
P( ilk,w)= p lr) UI = bk,9(l) pP(
(3k111.)) •
(3.28)
Entropy and Information 107
The conditional probability (3.26) also implies three equivalent conditions on sta-
tistical information,
any of which could be taken as the definition of a tree search. If the memory
finds that the system occupies branch k, the reduction in statistical information is
H(.117)— H(Jjk,r), which has average value
The argument that leads to Eq. (3.7), modified so that it starts with the minimal
program for pj,KI,„ and so that the sorting and coding algorithms apply to the sets
4, yields the inequality
Combining the resulting averaged upper bound with the lower bound (3.31), one
obtains the desired double inequality involving H(KI
which implies that H(K I ir) and Ek p(k17r)/(pjp,„Ip jci,r ) are equal to within
0(1). The double inequality (3.33) establishes the information balance alluded to
earlier: when the memory discovers that the system occupies branch k, the average
reduction in statistical information is essentially equal to the average algorithmic in-
formation that the memory acquires. The lower bound in Eq. (3.33) applies directly
to an analysis of Maxwell's demons.
The first approach works out from the trunk of the tree to the initial branches.
The second approach starts at the terminal branches and works back to the ini-
tial branches. First rewrite the double inequality (3.8) using the prior information
Ai, that is now appropriate:
Next find a double inequality that involves the conditional statistical information
H(JI k, it). The exact analogue of Eq. (3.8) involves an average of conditional al-
gorithmic information I(rjfp_7 10,) over p(ji k, ir), but the same double inequality
holds if one also includes the prior information p*.r̀,Kb, in the information given the
oracle. The result is a double inequality
Treating the double inequalities (3.34) and (3.35) as 0(1) equalities, one finds, after
a bit of manipulation, the 0(1) equality
Recalling that H(KI 7) = H(J; K17) leads to an attractive interpretation: the mu-
tual statistical information between J and K is essentially the same as the average
Entropy and Information 109
mutual algorithmic information (given the prior information) between state j and
the branch g(j) that branches to j.
That the second approach gives the same 0(1) equality as the first follows from
noting that /(piigcoor l pj „for , ri) = 0, since in the presence of the prior informa-
tion p';"., specification of a state j provides enough information to describe the
initial branch g(j) that branches to j. As a consequence, the mutual algorithmic
information satisfies the 0(1) equality
Zurek27 takes this analysis a step further by allowing the memory to be a "de-
mon," which can replace its record by an algorithmically simpler one. For example,
should the demon-memory find all the molecules on the left—admittedly a rare
occurrence—it could store the record in the much shorter form of a minimal pro-
gram for generating a string of N 0's, and it could complete the cycle by erasing
far fewer than N bits. Although it is generally not possible to find the minimal
program for a given string or prove that a program that works is the minimal one,
it is interesting, nonetheless, to inquire whether a demon-memory that could find
or guess minimal programs could also violate the Second Law.
To address this question, it is useful to abstract the description of the N-box
Szilard engine to a more general engine. In doing so, there are two equivalent meth-
ods of analysis. The first method identifies a subsystem that does isothermal work
as it extracts heat from a heat reservoir. This subsystem is described by a canonical
ensemble with temperature determined by the reservoir temperature. In the case of
the Szilard engine, this subsystem consists of the N molecules. The second method
regards the subsystem and the reservoir as a single isolated system, described by
the microcanonical ensemble. The information-gathering (compression) phase of the
cycle restricts this system to a part of its "phase space," from which it "expands"
as it does adiabatic work during the expansion phase of the cycle.
The second method is simpler and better suited to the previous discussion of a
system with a finite number of accessible microstates. Thus consider again a system
with microstates j. This system begins its cycle in an initial macrostate, which has
energy E(s) and thermodynamic entropy
1 _ dS
(4.2)
T1 dE lE=B0).
Sk = H k, v)kB In 2 , (4.3)
To extract work during the expansion phase, the memory uses a device like
the pistons in the Szilard engine. The memory must use its record to configure this
device so that it matches the observed sub-macrostate k—i.e., so that it "fits" the
configuration that some subsystem has when the system is in sub-macrostate k. For
the Szilard engine, this means inserting the pistons into the empty sides of the boxes.
The subsystem does isothermal work as it "expands" to its original configuration;
equivalently, the entire system does adiabatic work 1471+). The entire system is left
in a final macrostate, which has reduced energy,
The final energy E(fk) and final entropy S(fk) can depend on the observed sub-
macrostate k (they don't for the simple Szilard's engine, but they would if the
Szilard engine were modified so that the the partitions divided the boxes into un-
equal volumes).
The work extracted can be related to the change in entropy,
S(i) — SUk)
W,P.) = E( 1) — E(fk) =(dS/dE)E=E0)
, (4.6)
—71(50) — scro)
WI+)
= H(J)7) — H(Jik,r) = — log p(k 7) . (4.7)
In 2
The rightmost equality, though not true for a general one-stage tree search, holds
here because the probability to find the system in sub-macrostate k is p(kl ir) =
Jkb.7.
The average work extracted during the cycle is
A conventional thermodynamic analysis views the memory from the outside, assigns
probability p(kI 7) to its various records k, and concludes that its entropy increases
during a cycle from zero to
+) ----
0 T i -0
" k -" k T2
71 = +) 5 1 — 7-
,i- , (4.11)
k
which is Kelvin's efficiency limit for a heat engine operating between two heat
reservoirs.
Zurek27,28 advocates taking the "inside view"—i.e., analyzing the cycle from
the point of view of a demon-memory, which is smart enough to shorten, if it can, its
record of the sub-macrostate k. It is then crucial to understand precisely what the
demon knows at each stage of the cycle—what does the demon know, and when does
it know it? At the start of the cycle, the demon knows enough to specify the initial
system macrostate, but it must also know how to describe all the sub-macrostates,
else it could not be ready with a device that can be configured to "fit" all possible
sub-macrostates. Thus, the demon knows enough to generate a list of the accessible
microstates, assign them equal probabilities, and group them into the subsets Jk .
In the notation used above, the demon stores enough information to compute the
n-tuple pZci.., where the superscript (i) designates the initial macrostate. After
the cycle, the demon, to be ready for the next cycle, must know enough to compute
the corresponding n-tuple 0114.. for the final macrostate. If the system is a heat
reservoir, however, the n-tuples pVicti and p(jfA, are the same, the distinguishing
superscripts may be dropped, and one may refer to the demon's "standard state"
at the beginning (or end) of a cycle. The minimum amount of information that the
demon stores in its standard state is the algorithmic prior information /(pixi..).
To see why the two n-tuples are the same, refer to the previous discussion of
the microcanonical ensemble: the n-tuples are identical if the energy change 147P.)
is much smaller than the resolution LE = 2-n+1E(i) used to define the micro-
canonical ensemble—i.e., if 1 >> 2"W +)/E(') ", 2' (CTig(i))16.71/T1 I, where C
is the system's heat capacity. This condition, essentially the same as the condition
ItiTiai I << 1 for the system to be a heat reservoir, is satisfied by a sufficiently
macroscopic system. Treating the initial and final macrostates as the same ignores
the changes in system energy and entropy. These changes are essential for analyz-
ing the operation of the engine, but they are completely obscured in defining the
system macrostates by any reasonable energy resolution LE.
When the demon observes and records sub-macrostate k, it must store enough
information, beyond the prior information, to describe that sub-macrostate—i.e.,
to compute the n-tuple pjik,, that lists the states in J k and assigns them equal
conditional probabilities p(jlk, 7). To get back to its standard state, ready for the
next cycle, the demon must erase information from its memory. Zurek26 shows that
Entropy and Information 113
the minimum number of bits that must be erased is (in the notation used here) the
conditional algorithmic information /(pjik,,I pjxpr ). Zurek's argument is subtle
and relies on the properties of reversible computers, so it is fortunate that his
conclusion, as he emphasizes, makes eminent good sense: the minimum number of
bits that must be erased is the minimum amount of information the demon needs,
beyond the prior information, to describe sub-macrostate k.
Invoking Landauer's principle, one can now say that the demon must supply
work
147-) > /(pjf kor I p j,K1,0kB T2 In 2 , (4.12)
to erase the useless bits and return to its standard state, if the bits are erased
into an environment at temperature 2'2. Should the demon find the system in an
algorithmically simple sub-macrostate--i.e., pixix) << H(Ki r)—it can
beat Kelvin's efficiency limit, but on the average—the only sense in which the
Second Law is meant to apply—the demon must supply work
which is identical to Eq. (4.10) and leads again to Kelvin's efficiency (4.11). The
crucial last inequality in Eq. (4.13) follows from the strict left inequality in Eq. (3.33)
and justifies the attention devoted to inequalities for a tree search.
ACKNOWLEDGMENTS
This work was supported in part by the Faculty Research and Innovation Fund at
the University of Southern California.
114 Carlton M. Caves
REFERENCES
1. Bennett, C. H. "The Thermodynamics of Computation-A Review." Intl. J.
Theor. Phys. 12 (1982):905-940.
2. Bennett, C. H. "Demons, Engines, and the Second Law." Sci. Am. 257(5)
(November 1987):108-116.
3. Bennett, C. H. "Notes on the History of Reversible Computation." IBM J.
Res. Develop. 32 (1988):16-23.
4. Chaitin, G. J. "On the Length of Programs for Computing Finite Binary Se-
quences." J. Assoc. Comp. Mach. 13 (1966):547-569.
5. Chaitin, G. J. "A Theory of Program Size Formally Identical to Information
Theory." J. Assoc. Comp. Mack. 22 (1975):329-340.
6. Chaitin, G. J. "Algorithmic Information Theory." IBM J. Res. Develop. 21
(1977):350-359.
7. Chaitin, G. J. Information, Randomness, and Incompleteness: Papers on Al-
gorithmic Information Theory. Singapore: World Scientific, 1987. A collection
of Chaitin's papers.
8. Gallager, R. G. Information Theory and Reliable Communication. New York:
Wiley, 1968. An introduction to statistical information and its application to
communication theory.
9. Gibbs, J. W. "Elementary Principles in Statistical Mechanics." In The Col-
lected Works of J. Willard Gibbs, Vol. II, Part One. New Haven: Yale Univer-
sity, 1948.
10. Huang, K. Statistical Mechanics. New York: Wiley, 1963. A standard text-
book, more advanced than Ref. 21.
11. Huffman, D. A. "A Method for the Construction of Minimum Redundancy
Codes." Proc. IRE 40 (1952):1098-1101.
12. Jaynes, E. T. Papers on Probability, Statistics and Statistical Physics, edited
by R. D. Rosenkrantz. Dordrecht, Holland: Reidel, 1982. A collection of
Jaynes's papers, unrivaled for clarity and persuasiveness.
13. Jaynes, E. T. "Clearing up Mysteries: The Original Goal." In Maximum En-
tropy and Bayesian Methods, edited by J. Skilling. Dordrecht, Holland: Rei-
del, 1989,1-27.
14. Jaynes, E. T. "Probability in Quantum Theory." This volume.
15. Kolmogorov, A. N. "Three Approaches to the Quantitative Definition of In-
formation." Problemy Peredachi Informatsii 1(1) (1965):3-11. English trans-
lation in Prob. Inform. Transmission 1 (1965):1-7.
16. Kolmogorov, A. N. "Logical Basis for Information Theory and Probability
Theory." IEEE Trans. Inform. Theory IT-14 (1968):662-664.
17. Landauer, R. "Irreversibility and Heat Generation in the Computing Pro-
cess." IBM J. Res. Develop. 5 (1961):183-191.
18. Landauer, R. "Dissipation and Noise Immunity in Computation and Commu-
nication." Nature 335 (1988):779-784.
Entropy and Information 115
Complexity of Models
1. INTRODUCTION
The fundamental task of trying to learn the properties of the mechanism generating
a set of observed data is called the problem of model construction or modeling.
This in general involves sorting out the relevant variables, which may or may not
be directly observed as a part of the data, and trying to discover possible causal
relationships amongst them. Modeling problems are at the heart of all scientific
activity, and no wonder they pose formidable difficulties already in the simplest
cases such as the curve-fitting problems, which have been a subject of systematic
studies at least since the times of Gauss.
There have been persistent difficulties in properly formalizing the intuitive ideas
that we have about modeling, at the root of which—we think—is the question of
how to deal with the complexity of the models themselves. An obvious attempt has
been to avoid the issue by declaring that the data have been generated by some
relatively easy to describe "true" machinery, a "law," and the inevitable deviations
are ascribed to "noise" stemming from various sources. While such a strategy works
to a degree in simple situations, we are left in the cold when our preconceived ideas
about the "true" machinery fail to produce a "law" with which the deviations could
with good conscience be regarded as just instrument noise. To be sure, there are
The main task then is one of encoding, or rather estimating the code length
with which the encoding could be done while using the models in each class. What
is particularly fortunate and perhaps surprising is that we can provide excellent
estimates of the code length for virtually all the interesting classes of models, and
the result changes the way that we can do statistics.
2. MODELS
Intuitively, the idea of a model is a mathematical description of the relationship
between the selected variables with a number of free parameters to be fitted to
the data. This implicitly recognizes the fact that no matter what model we pick,
the "law" it expresses for the observed data does not quite hold—whether because
of "measurement errors" or other sources that we failed to include. There is no
fundamental difference between "random" errors and the "systematic" errors due
to our failure to include all the relevant variables. Both are manifestations of our
inability to fully explain the data. With this in mind, we propose the sweeping claim
that "all models are fundamentally probabilistic," or at any rate for our purposes
they can be expressed as such. More specifically, consider the observed data of the
kind (y, x) = (V', xn) = (Yi, x1), • • • , (Yn, xn), where we think the xt- variables to
influence the yt- variables, both in general having several components. The first
part in the model or, more properly, model class is a "deterministic law" pt =
Fw,v-1 1.6,,,) which allows us to predict the observed numbers yt as a parametric
function of the observations available at t. The parameter 61 is a collection of k
real-valued components 81 , . • - ,Ok , and their number is also to be determined. The
second part in the model is a probability distribution P(yt 10t) with which we model
the deviations, taken as independent since we imagine the deterministic part to
account for the dependencies between yt and yr for t 0 r. We can therefore write
the probability that such a model assigns to the observed data as
n
P(ylx, 9) = 11
t=1
P(yt It).
3. STOCHASTIC COMPLEXITY
We begin by recalling Shannon's coding theorem. Let C be a 1-1 coding function
from a discrete set X into the set of all finite binary strings B*. Let the length, i.e.,
the number of binary digits in C(x), be L(x). A code is said to be a prefix code, if
the lengths satisfy the fundamental Kraft inequality:
and that the lower bound can be reached only if the lengths satisfy the equality
L(x) = — log P(x) for every x. In this sense then, we know how the code should
be designed for a given distribution; because of this, we regard — log P(x) as the
Shannon complexity of x relative to the "model" P.
Our task at hand is to generalize the just-outlined coding program to data,
which are not modeled by a single distribution but by a whole class M = {P(ylx, 9)},
where 0 ranges over some subset fik of the k—dimensional Euclidean space, and y
and x denote sequences of truncated numbers to have a countable range.
First, for each fixed parameter value, we know from Shannon's work that it
takes about
L(ylx, 0) = — log P(ylx, 0) (3.3)
Complexity of Models 121
bits to encode the data. However, the decoding can only be done if the decoder
knows the parameter value that the encoder used. Whenever the parameters range
over the reals, it is clear that to describe them by a finite binary string they must be
truncated to a finite precision. For simplicity, take the precision the same S =
for all of them. Then we can write each parameter with q bits plus the number
of bits needed to write the integer part, which in fact turns out to be ignorable.
If B = 9(yjx) denotes the maximum likelihood estimate which minimizes the code
length (3.3), then since the truncation of B to the precision S may deviate from
the optimum as much as S for each component, the code length (3.3) after the
truncation is larger than the minimized code length. The larger S that we pick the
larger this increase will be in the worst case, while at the same time it will require
fewer bits to describe the truncated parameters. There is then the optimal worst
case precision, which can be found by expanding Eq. (3.3) into Taylor's series about
O. The result is that the optimal precision depends on the size of the observed data
set as follows: — log b = z log n, and we get the total code length as
with or without prior knowledge. Having such a prior we can eliminate the inherent
redundancy in the code length, and the result is as follows,15
where
P(Ylr, M) = jP(Y1r,e)7(0 )69- (3.6)
We call Eq. (3.5) the stochastic complexity of the data, given the model class M.
As a justification for calling Eq. (3.5) the stochastic complexity, we now de-
scribe in an informal manner a theorem which may be viewed as an extension of
Shannon's coding theorem. For a precise statement we refer to Rissanen.12 The the-
orem hinges on the general assumption about the models, taken here to be defined
by the densities ge le, 9), that there must be some estimator ow Ix") which con-
verges in probability to the parameter 9 defining the data generating distribution.
The convergence rate for very general classes of models is 1/,/ per parameter. It
follows then that no matter which distribution, say, density function g(yn le), one
picks the following inequality holds
2
—Ee log g(y"Ix")> —Eelogf(y"Ix", 9) + k e log n (3.7)
for all positive numbers e, and all 9 except some in a set whose volume goes to zero
as n grows.
If we take the density g as one resulting from our best efforts to estimate the
data-generating distribution, we see that not only is the left-hand side bounded
from below by the entropy but it must exceed it by a definite amount, which
simply represents the uncertainty inherent in any estimation process. If we divide
both sides by n, we see that this uncertainty reduces to zero (but not the first term)
at the given maximum rate as we get more data and learn more about the data-
generating machinery. We also see at once that Eq. (3.4) as a code length cannot be
improved upon asymptotically. Further, one can show under general conditions,14
that Eq. (3.5) is smaller than Eq. (3.4) for large n, and hence, in particular, it is
also asymptotically optimal.
The general objective in model building is to search for a model class which
minimizes Eq. (3.5). After the class is found, including the number of its free pa-
rameters, we can find the corresponding optimal parameter values and hence the
optimal model, if desired. Frequently, the complexity (3.5) is expressed in terms of
a density function, and in such a case it does not immediately represent a real code
length. It sometimes happens that the density function, written now as f(ylx, M),
is very peaked, which implies that the simple process of calculating the probability
of the truncated data by f (Mx, M)bn may be too crude and may lead to incorrect
code length for the data.
We illustrate the computation of the stochastic complexity with the polynomial
curve-fitting probleni on data from Picard and Cook.8
Complexity of Models 123
EXAMPLE 1
From 20 flocks of Canada geese the numbers xi , i = 1, • • • , 20, of adult birds were
estimated as 10, 10, 12, 20, 40, 40, 30, 30, 20, 20, 18, 35, 35, 35, 30, 50, 30, 30, 45,
and 30, respectively. The same flocks were also photographed, from which the true
numbers of adult birds yi , i = 1, • • • , 20, were counted. Written in the same order
as the corresponding estimates they are as follows: 9, 11, 14, 26, 57, 56, 38, 38, 22,
22, 18, 43, 42, 42, 34, 62, 30, 30, 48, and 25. We like to fit a polynomial predictor
as in Eq. (2.1). With a quadratic deviation measure (yt - gt)2/r, where r is a
parameter, the distribution (2.2) is gaussian with mean gt and variance r/2. We
select the prior ir(0) also as gaussian with mean zero and covariance (r/c)I, where
I is the k x k identity matrix, and c a further "nuisance" parameter to be picked
in a moment. For r we pick the so-called conjugate prior,2
where a is another "nuisance" parameter. The reason for choosing these priors is
simply because we can get the integral (3.6) written in a closed form. The two
nuisance parameters can then be selected so that the complexity is minimized,
which gives the final criterion
where K(n) depends only on n and .can be dropped. Further, the elements of the
matrix X'(t) are given as xii x2i-17 i = 1, • • • , k, j = 1, • • • , n, and R is the
minimized sum of the squared deviations Et(yt -
The stochastic complexities as a function of k come out as /(ylx, 1) = 123.5,
/(ylx, 2) = 93.7, /(ylx, 3) = 102.6, and /(ylx, 4) = 115.7. Hence, the minimizing
polynomial is linear. Its coefficients are given by 01 = -3.8, 92 = 1.3, and the
resulting line fits the data well. In fact, when the plot is made, most human ob-
servers, using personal judgment, would pick this line rather than a higher-degree
polynomial as the best fit.
In virtually all applications the mean constraints are so selected that they equal
the actually measured value d = A(t)= (Al (x), • • • ,Ak(x)), where x now denotes
the particular observed data. Then it is true that
max H(A) = log Z(Ad ) + A'd d = min — log p(x IA). (4.4)
We thus see that the requirement of the maximum entropy coincides with the
requirement of the shortest code length, when the constraints are taken as known.
However, these are not known or unique in model building, and in order to be able
to compare fairly several conceivable suggestions, we should add to the code length
(4.4) the code length needed to describe these suggestions, which is nothing but
the general MDL principle. In an approximation then, we should minimize
REFERENCES
1. Chaitin, G. J. Algorithmic Information Theory. Cambridge: Cambridge Uni-
versity Press, 1987.
2. Cox, D. R., and D. V. Hinkley. Theoretical Statistics London: Imperial Col-
lege, 1974.
3. Jaynes, E. "Information Theory and Statistical Mechanics." Phys. Rev. 106
(1957):620.
4. Jaynes, E. "Information Theory and Statistical Mechanics." Phys. Rev. 108
(1957):171.
5. Jaynes, E. "On the Rationale of Maximum Entropy Methods." Proc. of
IEEE, Special Issue on Spectral Estimation, edited by S. Haykin, 70
(1982):939-952.
6. Kemeny, J. "The Use of Simplicity in Induction." Phil. Rev. 62 (1953):391-
315.
7. Kolmogorov, A. N. "Three Approaches to the Quantitative Definition of In-
formation." Problems of Information Transmission 1 (1965):4-7.
8. Picard, R. R,., and R. D. Cook. "Cross-Validation of Regression Models."
JASA 79 (1984):387, 575-583.
9. Rissanen, J. "Modeling by Shortest Data Description." Autornatica 14
(1978):465-471.
10. Rissanen, J. "A Universal Prior for Integers and Estimation by Minimum De-
scription Length." Ann. of Stat. 11 (1983):416-431.
11. Rissanen, J. "Universal Coding, Information, Prediction, and Estimation."
IEEE Trans. Inf. Theory IT-30 (1984):629-636.
12. Rissanen, J. "Stochastic Complexity and Modeling." Annals of Statistics 14
(1986):1080-1100.
13. Rissanen, J. "A Predictive Least Squares Principle." IMA Journal of Mathe-
matical Control and Information, 3(2-3) (1986):211-222.
14. Rissanen, J. "Stochastic Complexity." The Journal of the Royal Statistical
Society 49 (1987):223-239 and 252-265 (with discussions).
15. Rissanen, J. Stochastic Complexity in Statistical Inquiry. New Jersey: World
Scientific Publ. Co., 1989.
16. Schwarz, G. "Estimating the Dimension of a Model." Ann. of Stat. 6
(1978):461-464.
17. Solomonoff, R. J. "A Formal Theory of Inductive Inference." Part I, Informa-
tion and Control 7 (1964):1-22.
18. Solomonoff, R. J. "A Formal Theory of Inductive Inference." Part II, Infor-
mation and Control 7 (1964):224-254.
C. H. Woo
Center for Theoretical Physics, Department of Physics and Astronomy, University of Mary-
land, College Park, MD 20742
2N numbers. Even after one uses Newton's law to obtain an economical model
to explain this specific data set, the input still contains N + 0(1) numbers. The
number of bits assigned to the laws of motion is in the 0(1) term, which would
be insignificant compared to N when N is large. In this example of laboratory
experiment, the N numbers corresponding to the boundary condition (the initial
heights) are arbitrary and uninteresting, and it makes sense to ignore the boundary
condition and concentrate on the laws. But, in the case where a description of our
specific world is contemplated, as in quantum cosmology or in accounting for any
natural feature which is not an inevitable result of the laws alone, the boundary
condition represents indispensable input information. Then a minimal input pro-
gram, which contains the information about such specific features, can be very long
indeed. When the program itself is much longer than the instruction for simulating
one computer on another, the notion of minimal programs becomes meaningful.
Since we will be concerned with the economy of models which include the
boundary conditions, we want to state clearly what we mean by "boundary condi-
tions": we include in boundary conditions any input information over and above the
laws of motion which is needed to describe the specific behavior of a system. Thus,
a "boundary condition" includes the initial condition, but can contain much more.
In particular, in a quantal description of a sequence of events in our specific world,
the input includes the initial density matrix (initial condition), the unitary evolu-
tion (the laws), and the information needed to specify one "consistent history"4,6,10
from among all the possible consistent histories; thus, "boundary condition" in this
case includes the first and the third type of information.
Suppose one considers the conditional probability for the occurrence of a certain
event in terms of the initial density matrix po and a sequence of time-ordered
projections E1(n5 ) in the Heisenberg picture (where j refers to the nature of the
observable and n5 to the eigenvalue or a group of eigenvalues):
P (Ei(ni) =
Tr (Ei(ni) Ei(ni)PoEi(ni) • • - Ei(ni)) (1)
Tr (Ei-i (ni-i) Ei(ni)poEi(ni) • • -Ei-i(ni-i))
As one traces the history (that is, repeats Eq. (1) for decreasing values of i), two
types of projections should be distinguished: those which yield conditional proba-
bilities near one and those which do not. Projections which affirm classical links,
to the extent that they are favored by the unitary evolution, are in the first cate-
gory, whereas the association of our world with one specific history following any
branching of histories is in the second category. It is the second type that signifi-
cantly changes the input information, since in this case the Ei(ni), which becomes
part of our history, is not determined by being strongly favored by the dynamics
and the previous history, as is the case for the first type of projections. If both the
initial condition and the laws are simple, then almost all the information content of
our specific world arises from the second type of projections, that is, from amplified
quantum fluctuations. There is a prevailing attitude that the essence of quantum
Laws and Boundary Conditions 129
measurements resides in the first stage when a pure state becomes locally a mix-
ture, and not in the second stage when one alternative in the mixture becomes
identified with the actual outcome. The reasoning is that this second stage is "just
like what happens in classical probability." But in classical probability one accepts
the arbitrariness of the outcome of a coin toss because, one thought, the deter-
mining factors are numerous and microscopic. When the link between the theory
and the observations in our specific world still involves arbitrary elements even in
a putatively self-contained quantum theory, this fact deserves to be underscored
and the stage at which arbitrariness enters clearly identified. In any case, in terms
of contributions to the minimal program describing our specific world, it is in the
second stages that a substantial amount of algorithmic information enters.
p(w) = E (2)
p(w)
130 c. H. Woo
Let p* (w) be a minimal-length program, and let I p* (w) I = 1(w). .1(w) is called the
algorithmic information (or algorithmic complexity) of w. It is also approximately
equal to the algorithmic entropy H (w) = — log2 P(w):
Although Eq. (3) says that an individual string with a short minimal program is
overwhelmingly more likely to be produced compared to one with a significantly
longer minimal program, one must reckon with the fact that there are many more
long programs than short ones. When one compares categories of output strings,
the highly compressible category versus the incompressible category, it is no longer
true that the former is overwhelmingly favored. Let us define the expectation value
E(a) of an attribute a(w) of strings w by E(a) a E. a(w)P(w), and denote the
length of w by n(w), then from the Appendix we see that when we consider the
limit of long strings as n(w) oo:
(i) shows that there is a non-negligible probability for the occurrence of highly
compressible strings, whereas (ii) shows that there is also a non-negligible proba-
bility for the occurrence of nearly incompressible strings. In short, the probability
for categories is broadly distributed. A broad distribution is not very informative;
still, it should be noted that this conclusion is radically different from the naive
expectation that, because there are more long programs, the chance of getting a
short program at random is exponentially small. What the algorithmic information
theory brings out is that there are also factors working in favor of the occurrence of
outputs with short programs. (We will discuss a physical analog of this mechanism
later.)
Once one finds a minimal or near-minimal program for a given string w, it has
not only descriptive value but also projective value, in the sense that it will be
useful in coding the probable extensions of w. By an extension we mean here the
concatenation (w, x) of w with another string x. From Chaitin,1 theorem 15b:
where the ratio in the braces is the conditional probability that w is extended into
(w, x), c is 0(1), and I(x/w) is the length of the minimal program for producing
x if the computer already has available to it p*(w). If the same computer is used
for producing w and (w, x), and if there are not many near-minimal programs for
them, even r does not differ from unity by many orders of magnitude. Then Eq. (6)
Laws and Boundary Conditions 131
implies that the minimal program of the original string w facilitates the economical
coding of its probable extensions.
Having looked at the a priori probabilities, we now return to the idea of regard-
ing certain facts about our specific world as a data string, and its minimal program
as a theoretical model. Obviously we will never be able to deal with all the facts—
the cosmologists, for instance, ignore many of the fine details of our universe, but
it is always understood that more features can be added later. By addition we do
not mean just an extension in time, but also improvements in accuracy, inclusion of
previously ignored variables, etc. Then, if we envision that at some point the data
that one tries to understand are rich enough in information so that one expects,
say, I(w) > 106 bits, it makes sense to search for maximum economy even in model-
ing the boundary conditions. Although a near-minimal program does not have the
universal applicability of the laws, it is like the laws in that its utility goes beyond
a mere economical summary of the data in question and may extend to data not
yet studied.
There are different ways whereby noises can become finitely effective; it can be
shown that some one-dimensional cellular arrays possess computational universality
with the effective parts of noises as inputs, but the models that we have found
so far are somewhat artificial. No doubt the list of physical systems capable of
universal computation will continue to grow, and the relevance of these a priori
probabilities will be assured if there are many such systems with noises as inputs.
If, for example, some spin systems have computational universality with finitely
effective noises as inputs, then one can predict how the complexities are distributed
for the configurations resulting from a fixed initial state.
In this kind of application, it is essential that many similar copies of a specific
type of system exist, so that the probability distributions have empirical relevance.
The situation is different when non-duplicated features are the subject of study.
Although the application of the generic features of a probabilistic theory to the
single universe to which we have access has become a not-so-rare practice, to at-
tribute objective reality to the alternative universes of a probabilistic theory would
be a deviation from the principle of verifiability (provided the universes are strictly
non-communicating, either through tunneling or otherwise). The only justification
that I know of for applying the generic features of a probabilistic theory to a single
specimen with an already known behavior is as an economical model for the extant
data, with the utility of the model to be checked by the persistence of that economy
in explaining extensions of the data.
enter into the world's history: quasi-classical projections and amplified quantum
fluctuations. We can compress the description of the first type in the absence of
chaos, but how much chaos and other instabilities enhance the role of the second
type in our history is not known. Therefore, in our opinion, it may be premature
to ask: "Why is the world compressible?" (because we do not know if the degree of
compressibility will turn out to be truly remarkable); it may be better to first ask
a different question: "How complex is our world?" This is the same as asking for
an estimate of the algorithmic information in our universe. Today the cosmologists
have an estimate for the total entropy in the universe, but apparently not even a
rough idea about its algorithmic entropy.
In pointing out the ambiguity in compressibility, we do not mean to deny that
the existence of universal laws is remarkable by itself. If we look at only selective
aspects of the world and concern ourselves with only certain regularities, the econ-
omy of the laws is truly impressive. What we have argued is that this economy is
not suitably expressed as the brevity of the minimal program for our specific world;
however, it is possible that the economy of laws is related to the brevity of the core
of the program. The notion of a core can be illustrated with a particular universal
computer studied by Chaitin.1 It reads the first portion of a program and interprets
it as a symbolic expression defining a function in a "pure version" of LISP, and the
remainder of the program serves as the "input data." In this case the S-expression
is what we call the core of the program. The probability that the first k bits of a
random string happens to be a valid S-expression decreases as Ck-3/2 (Chaitin,1
appendix B); hence, it favors the occurrence of short S-expressions (in contrast,
the probability that the first n bits happens to be a valid program decreases only
slightly faster than cn-1). If one wants a machine-independent core length, one
could define it for other machines to be the same as for this computer through
simulation, so that the machine dependence enters only into the program length
and not the core length. In view of the permissive semantics of this "pure LISP,"
it is plausible that the higher probability for the occurrence of short S-expressions
implies also a high probability that even long minimal programs have short cores,
but we have not been able to prove it.
As this last section is more speculative than the previous ones, we summarize
the main points of the earlier sections. The a priori probabilities of algorithmic
information theory show how a certain amount of order can automatically emerge
from random inputs. These probabilities are relevant to complex physical systems
which have computational universality and which are influenced by random fluc-
tuations in the formative stage. For a system meeting the relevancy condition, the
implication for model building is that even a model which accounts for its particular
individual features can be much more economical than naively expected, because
out of the numerous fluctuations that it is exposed to, only a fraction is effective
in giving rise to robust features.
134 C. H. Woo
Hence,
log n 1
E( )>A+BE = oo . (A2)
I — m>m loe(m)
Eq. (5) follows from the fact that, with I < n, the number of strings with length
n is < 2n-/(n)+0(1) and, hence, the number of such strings with I between n and
/m. = n + I(n) + 0(1) is greater than or equal to 2n(1— 2-/(n)+0(1)). Here n in
the argument of I stands for the string which is the binary representation of the
number n. Then
Ene2-/(n)+0(i) 0 _ 2-/(n)+0(1))
n1-c ) >
E (— - (A3)
The sum over n will be done by first summing over all numbers of a given length m,
and then summing over m. There is only a negligible fraction of the 2m_ 1 strings in
the first sum for which 2-An) is order 1, and the first sum is greater than or equal
to C2E'n/m2; so the sum over m diverges.
Laws and Boundary Conditions 135
ACKNOWLEDGMENTS
These topics were discussed at different times with Seth Lloyd, Ted Jacobson, and
Jim Hartle, and I thank them for helpful comments. I thank Charles Bennett for a
discussion of the possible effect of noises in the life game.
Some results on what were called "cores" in the last section have been ob-
tained by M. Koppel, who used a slightly different formulation. See M. Koppel,
"Structure," in The Universal Turing Machine, A Half-Century Survey, edited by
R. Herken (Oxford University Press, 1988). I thank Charles Bennett for bringing
this reference to my attention.
REFERENCES
1. Chaitin, G. Algorithmic Information Theory. Cambridge: Cambridge Univ.
Press, 1987.
2. Davies, P. C. W. "Why is the Physical World Understandable?" This volume.
3. Fredkin, E., and T. Toffoli. "Conservative Logic." Intl. J. Theor. Phys. 21
(1982):2,9.
4. Griffiths, R. B. "Consistent Histories and the Interpretation of Quantum Me-
chanics." J. Stat. Phys. 36 (1984):219.
5. Gacs, P., and J. Reif. "A Simple Three-Dimensional Real-Time Reliable Cel-
lular Array." J. Comput. Sys. Sci. 36 (1988):125.
6. Gell-Mann, M. "Entropy, Quantum and Classical Information, and Complex-
ity in the Universe." Report given at this workshop on collaborative work
with J. Hartle and V. Telegdi, 1989.
7. Kolmogoroff, A. N. "Three Approaches to the Quantitative Definition of In-
formation." Prob. Info. Trans. 1 (1965):1.
8. Levin, L. A. "Various Measures of Complexity for Finite Objects." Soy. Math.
DokL 17 (1976):522.
9. Margolus, N. "Physics-Like Models of Computation." Physics 10D (1984):81.
10. Omnes, R. "The Interpretation of Quantum Mechanics." Phys. Lett. A125
(1987):170.
11. Solomonoff, R. J. "A Formal Theory of Inductive Inference." Info. 6 Control
7 (1964):1.
12. Zurek, W. H. "Algorithmic Randomness and Physical Entropy." Phys. Rev.
A40 (1989):4731-4751.
Charles H. Bennett
IBM Research, Yorktown Heights NY 10598, USA
INTRODUCTION
Natural irreversible processes are nowadays thought to have a propensity for self-
organization—the spontaneous generation of complexity (Figure 1). One may at-
tempt to understand the origin of complexity in several ways. One can attempt to
elucidate the actual course of galactic, solar, terrestrial, biological, and even cul-
tural evolution. One can attempt to make progress on epistemological questions
such as the anthropic principle3—the ways in which the complexity of the universe
is conditioned by the existence of sentient observers—and the question often raised
in connection with interpretations of quantum mechanics of what, if any, distinction
science should make between the world that did happen and the possible worlds that
might have happened. One can seek a cosmological "theory of everything" without
which it would seem no truly general theory of natural history can be built. Finally,
at an intermediate level of humility, one can attempt to discover general principles
governing the creation and destruction of complexity in the standard mathematical
models of many-body systems, e.g., stochastic cellular automata such as the Ising
model, and partial differential equations such as those of hydrodynamics or chemi-
cal reaction-diffusion. An important part of this latter endeavor is the formulation
of suitable definitions of complexity: definitions that on the one hand adequately
capture intuitive notions of complexity, and on the other hand are sufficiently ob-
jective and mathematical to prove theorems about. Below we list and comment on
several candidates for a complexity measure in physics, advocating one, "logical
depth," as most suitable for the development of a general theory of complexity in
many-body systems. Further details can be found in Bennett.6
Much later...
LIFE-LIKE PROPERTIES
Life-like properties (e.g., growth, reproduction, adaptation) are very hard to de-
fine rigorously, and also are too dependent on function, as opposed to structure.
Intuitively, a dead human body is still complex, though it is functionally inert.
THERMODYNAMIC POTENTIALS
Thermodynamic potentials (entropy, free energy) measure a system's capacity for
irreversible change, but do not agree with intuitive notions of complexity. For ex-
ample, a bottle of sterile nutrient solution (Figure 2) has higher free energy, but
lower subjective complexity, than the bacterial culture it would turn into if inoc-
culated with a single bacterium. The rapid growth of bacteria following introduc-
tion of a seed bacterium is a thermodynamically irreversible process analogous to
crystalization of a supersaturated solution following introduction of a seed crystal.
Even without the seed either of these processes is vastly more probable than its
reverse: spontaneous melting of crystal into supersaturated solution, or transfor-
mation of bacteria into high-free-energy nutrient. The unlikelihood of a bottle of
sterile nutrient transforming itself into bacteria is therefore not a manifestation of
the second law, but rather of a putative new "slow growth" law that complexity,
however defined, ought to obey: complexity ought not to increase quickly, except
with low probability, but can increase slowly, e.g., over geological time as suggested
in Figure 1.
COMPUTATIONAL UNIVERSALITY
The ability of a system to be programmed through its initial conditions to simu-
late any digital computation. Computational universality, while it is an eminently
mathematical property, is still too functional to be a good measure of complexity
of physical states: it does not distinguish between a system capable of complex be-
havior and one in which the complex behavior has actually occurred. As a concrete
example, it is known that classical billiard balls,1° moving in a simple periodic po-
tential, can be prepared in an initial condition to perform any computation; but if
such a special initial condition has not been prepared, or if it has been prepared but
the computation has not yet been performed, then the billiard ball configuration
does not deserve to be called complex. Much can be said about the theory of uni-
versal computers; here we note that their existence implies that the input-output
relation of any one of them is a microcosm of all of deductive logic, and in particular
140 Charles H. Bennett
Second Law
allows, but
"slow growth
law" forbids
Second it to happen
Law quickly
Forbids
FIGURE 2 Complexity is not a thermodynamic potential like free energy. The second
law allows a bottle of sterile nutrient solution (high free energy, low complexity) to turn
into a bottle of bacteria (lower free energy, higher complexity), but a putative "slow
growth law' forbids this to happen quickly, except with low probability.
How to Define Complexity in Physics, and Why 141
ALGORITHMIC INFORMATION
Algorithmic Information (also called Algorithmic Entropy or Solomonoff-
Kolmogorov-Chaitin Complexity) is the size in bits of the most concise univer-
sal computer program to generate the object in question.1,8,9,14,19,20 Algorithmic
entropy is closely related to statistically defined entropy, the statistical entropy of
an ensemble being, for any concisely describable ensemble, very nearly equal to the
ensemble average of the algorithmic entropy of its members; but for this reason al-
gorithmic entropy corresponds intuitively to randomness rather than to complexity.
Just as the intuitively complex human body is intermediate in entropy between a
crystal and a gas, so an intuitively complex genome or literary text is intermediate
in algorithmic entropy between a random sequence and a perfectly orderly one.
LONG-RANGE ORDER
Long-Range Order, the existence of statistical correlations between arbitrarily re-
mote parts of a body, is an unsatisfactory complexity measure, because it is present
in such intuitively simple objects such as perfect crystals.
weather news). Rather it reflects each newspaper's descent from a common causal
origin in the past. Similar correlations exist between genomes and organisms in
the biosphere, reflecting the shared frozen accidents of evolution. This sort of long-
range mutual information, not mediated by the intervening medium, is an attractive
complexity measure in many respects, but it fails to obey the putative slow-growth
law mentioned above: quite trivial processes of randomization and redistribution,
for example smashing a piece of glass and stirring up the pieces, or replicating
and stirring a batch of random meaningless DNA, generate enormous amounts of
remote non-additive entropy very quickly.
LOGICAL DEPTH
Logical Depth = Execution time required to generate the object in question by a
near-incompressible universal computer program, i.e., one not itself computable as
output of a significantly more concise program. Logical depth computerizes the Oc-
cam's razor paradigm, with programs representing hypotheses, outputs representing
phenomena, and considers a hypothesis plausible only if it cannot be reduced to a
simpler (more concise) hypothesis. Logically deep objects, in other words, contain
internal evidence of having been the result of a long computation or slow-to-simulate
dynamical process and could not plausibly have originated otherwise. Logical depth
satisfies the slow-growth law by construction.
THERMODYNAMIC DEPTH
The amount of entropy produced during a state's actual evolution has been pro- .
posed as a measure of complexity by Lloyd and Pagels.16 Thermodynamic depth
can be very system-dependent: some systems arrive at very trivial states through
much dissipation; others at very nontrivial states with little dissipation.
portion of the history recent enough not to have been swamped by dynamically
amplified environmental noise.
Equilibrium Crystal
Region 1 Region 2
Ste « sl + S2
THEORY OF COMPUTATION
The conjectured inequality of the complexity classes P and PSPACE is a necessary
condition, and the stronger conjecture of the existence of "one-way" functions7,15 is
a sufficient condition, for certain very idealized physical models (e.g., billiard balls)
to generate logical depth efficiently.
ERROR-CORRECTING COMPUTATION
What collective phenomena suffice to allow error-correcting computation and/or
the generation of complexity to proceed despite the locally destructive effects of
noise? In particular, how does dissipation favor the generation and maintenance of
complexity in noisy systems?
n Dissipation allows error-correction, a many-to-one mapping in phase space.
How to Define Complexity in Physics, and Why 145
n Dissipative systems are exempt from the Gibbs phase rule. In typical
d-dimensional equilibrium systems with short-ranged interactions, barring sym-
metries or accidental degeneracy of parameters such as occurs on a coexistence
line, there is a unique thermodynamic phase of lowest free energy.4 This ren-
ders equilibrium systems ergodic and unable to store information reliably in
the presence of "hostile" (i.e., symmetry-breaking) noise. Analogous dissipative
systems, because they have no defined free energy in d dimensions, are exempt
from this rule. A (d + 1)-dimensional free energy can be defined, but varying
the parameters of the d-dimensional model does not in general destabilize one
phase relative to another.
n What other properties besides irreversibility does a system need to take ad-
vantage of the exemption from Gibbs phase rule? In general the problem is to
correct erroneous regions, in which the data or computation locally differs from
that originally stored or programmed into the system. These regions, which
may be of any finite size, arise spontaneously due to noise and to subsequent
propagation of errors through the system's normal dynamics. Local majority
voting over a symmetric neighborhood, as in the Ising model at low temper-
ature, is insufficient to suppress islands when the noise favors their growth.
Instead of true stability, one has a metastable situation in which small islands
are suppressed by surface tension, but large islands grow. Two methods are
known for achieving absolute stability in the presence of symmetry-breaking
noise.
Anisotropic Voting Rules4,12,17 in two or more dimensions contrive to shrink
arbitrarily large islands by differential motion of their boundaries. The rule is such
that any island, while it may grow in some directions, shrinks in others; eventually
the island becomes surrounded by shrinking facets only and disappears (Figure 4).
The requisite anisotropy need not be present initially, but may arise through spon-
taneous symmetry breaking.
Hierarchical Voting Rules." These complex rules, in one or more dimensions,
correct errors by a programmed hierarchy of blockwise majority voting. The com-
plexity arises from the need of the rule to maintain the hierarchical structure, which
exists only in software.
SELF-ORGANIZATION
Is "self-organization," the spontaneous increase of complexity, an asymptotically
qualitative phenomenon like phase transitions? In other words, are there reason-
able models whose complexity, starting from a simple uniform initial state, not
only spontaneously increases, but does so without bound in the limit of infinite
space and time? Adopting logical depth as the criterion of complexity, this would
mean that for arbitrarily large times t most parts of the system at time t would
146 Charles H. Bennett
a d
b e
c f
•
•
•
•
• • •
•
• • • • • •
•
• • • • • • •
• • • • •
• • • • • • • •
• • • • • • •
:
• •
a •
D
FIGURE 4 Anisotropic Voting Rules stabilize information against symmetry-breaking
noise. It is not difficult to find irreversible voting models in which the growth velocity
of a phase changes sign depending on boundary orientation (this is impossible in
reversible models, where growth must always favor the phase of lowest bulk free
energy). Here we show the fate of islands in an irreversible two-phase system in
which growth favors one phase (stippled) at diagonal boundaries and the other phase
(clear) at rectilinear boundaries. (a-c) An island of the dear phase becomes square and
disappears. Similarly (d-f) an island of the stippled phase becomes diamond-shaped
and disappears. Small perturbations of the noise perturb the boundary velocities slightly
but leave the system still able to suppress arbitrarily large fluctuations of either phase.
How to Define Complexity in Physics, and Why 147
contain structures that could not plausibly have been generated in time much less
than 2. A positive answer to this question would not explain the history of our
finite world, but would suggest that its quantitative complexity can be legitimately
viewed as an approximation to a well-defined property of infinite systems. On the
other hand, a negative answer would suggest that our world should be compared to
chemical reaction-diffusion systems that self-organize on a macroscopic but finite
scale, or to hydrodynamic systems that self-organize on a scale determined by their
boundary conditions, and that the observed complexity of our world may not be
"spontaneous" but rather heavily conditioned by the anthropic requirement that it
produce observers.
EQUILIBRIUM SYSTEMS
Which equilibrium systems (e.g., spin glasses, quasicrystals) have computationally
complex ground states?
DISSIPATIVE PROCESSES
Do dissipative processes such as turbulence, that are not explicitly genetic or com-
putational, still generate large amounts of remote non-additive entropy? Do they
generate logical depth? Does a waterfall contain objective evidence, maintained de-
spite environmental noise, of a nontrivial dynamical history leading to its present
state, or is there no objective difference between a day-old waterfall and a year-
old one? See Ahlers and Walden1 for evidence of fairly long-term pseudorandom
behavior near the onset of convective turbulence.
ACKNOWLEDGEMENTS
Many of the ideas in this paper were shaped in years of discussions with Gregory
Chaitin, Rolf Landauer, Peter Gacs, Geoff Grinstein, and Joel Lebowitz.
REFERENCES
1. Ahlers, G., and R. W. Walden. "Turbulence near Onset of Convection." Phys.
Rev. Lett. 44 (1980):445.
2. Barrow, J. D., and F. J. Tipler. The Anthropic Cosmological Principle. Ox-
ford: Oxford University Press, 1986.
148 Charles H. Bennett
INTRODUCTION
The dynamical behavior of complex information processing systems, and how those
behaviors may be improved by natural selection, or other learning or optimizing
processes, are issues of fundamental importance in biology, psychology, economics,
and, not implausibly, in international relations and cultural history. Biological evo-
lution is perhaps the foremost example. No serious scientist doubts that life arose
from non-life as some process of increasingly complex organization of matter and
energy. A billion years later we confront organisms that have evolved from simple
precursors, that unfold in their own intricate ontogenies, that sense their worlds,
categorize the states of those worlds with respect to appropriate responses, and in
their interactions form complex ecologies whose members coadapt more or less suc-
cessfully over ecological and evolutionary time scales. We suppose, probably rightly,
that Mr. Darwin's mechanism, natural selection, has been fundamental to this as-
tonishing story. We are aware that, for evolution to "work," there must be entities
which in some general sense reproduce, but do so with some chance of variation.
That is, there must be heritable variation. Thereafter, Darwin argues, the differ-
ences will lead to differential success, culling out the fitter, leaving behind the less
fit.
But, for at least two reasons, Darwin's insight is only part of the story. First,
in emphasizing the role of natural selection as the Blind Watchmaker, Darwin and
his intellectual heritors have almost come to imply that without selection there
would be no order whatsoever. It is this view which sees evolution as profoundly
historically contingent; a story of the accidental occurrence of useful variations ac-
cumulated by selection's sifting: evolution as the Tinkerer. But second, in telling
us that natural selection would cull the fitter variants, Darwin has implicitly as-
sumed that successive cullings by natural selection would be able to successively
accumulate useful variations. This assumption amounts to presuming what I shall
call evolvability. Its assumption is essential to a view of evolution as a tinkerer which
cobbles together ad hoc but remarkable solutions to design problems. Yet "evolv-
ability" is not itself a self-evident property in complex systems. Therefore, we must
wonder what the construction requirements may be which permit evolvability, and
whether selection itself can achieve such a system.
Consider the familiar example of a standard computer program on a sequential
von Neumann universal Turing machine. If one were to randomly exchange the order
of the instructions in a program, the typical consequence would be catastrophic
change in the computation performed.
Try to formulate the problem of evolving a minimal program to carry out some
specified computation on a universal Turing machine. The idea of a minimal pro-
gram is to encode the program in the shortest possible set of instructions, and per-
haps initial conditions, in order to carry out the desired computation. The length
of such a minimal program would define the algorithmic complexity of the compu-
tation. Ascertainment that a given putative minimal program is actually minimal,
however, cannot in general be carried out. Ignore for the moment the problem of
Requirements for Evolvability in Complex Systems 153
ascertainment, and consider the following: Is the minimal program itself likely to
be evolvable? That is, does one imagine that a sequence of minimal alterations in
highly compact computer codes could lead from a code which did not carry out the
desired computation to one which did?
I do not know the answer; nevertheless, it is instructive to characterize the
obstacles. Doing so helps define what one might mean by "evolvability." In order
to evolve across the space of programs to achieve a given compact code to carry
out a specified computation, we must first be able to ascertain that any given
program actually carries out the desired computation. Think of the computation
as the "phenotype," and the program as the "genotype." For many programs, it
is well known that there is no short cut to "seeing the computation" carried out
beyond running the program and observing what it "does." That is, in general,
given a program, we do not know what computation it will perform by any shorter
process than observing its "phenotype." Thus, to evolve our desired program, we
must have a process which allows candidate programs to exhibit their phenotypes,
then a process which chooses variant programs and "evolves" towards the target
minimal compact program across some defined program space. Since programs, and
if need be their input data, can be represented as binary strings, we can represent
the space of programs in some high-dimensional Boolean hyperspace. Each vertex
is then a binary string, and evolution occurs across this space to or toward the
desired minimal target program.
Immediately we find two problems. First, can we define a "figure of merit"
which characterizes the computation carried out by an arbitrary program—defines
its phenotype—which can be used to compare how "close" the phenotype of the
current program is to that of the desired target program. This requirement is impor-
tant since, if we wish to evolve from an arbitrary program to one which computes
our desired function, we need to know if alterations in the initial program bring
the program closer or further from the desired target program. The distribution of
this figure of merit, or to a biologist, "fitness" across the space of programs defines
the "fitness landscape" governing the evolutionary search process. Such a fitness
landscape may be smooth and single peaked, with the peak corresponding to the
desired minimal target program, or may be very rugged and multipeaked. In the
latter case, typical of complex combinatorial optimization problems, any local evo-
lutionary search process is likely to become trapped on local peaks. In general, in
such tasks, attainment of the global optimum is an NP-complete problem, and an
evolutionary search will not attain the global optimum in reasonable time. Thus, the
second problem with respect to evolvability of programs relates to how rugged and
multipeaked the fitness landscape is. The answers are not known, but the intuition
is clear. The more compact the code becomes, the more violently the computation
carried out by the code changes at each minimal alteration of the code. That is,
long codes may have a variety of internal sources of redundancy which allows small
changes in the code to lead to small changes in the computation. By definition, a
minimal program is devoid of such redundancy. Thus, inefficient redundant codes
may occupy a landscape which is relatively smooth and highly correlated in the
sense that nearby programs have nearly the same fitness by carrying out similar
154 Stuart A. Kauffman
computations. But as the programs become shorter, small changes in the programs
induce ever more pronounced changes in the phenotypes. That is, the landscapes
become ever more rugged and uncorrelated. In the limit where fitness landscapes
are entirely uncorrelated, such that the fitness of "1-mutant" neighbors in the space
are random with respect to one another, it is obvious that the fitness of a neighbor
carries no information about which directions are good directions to move across
the space in an evolutionary search for global, or at least good, optima. Evolution
across fully uncorrelated landscapes amounts to an entirely random search pro-
cess where the landscape itself provides no information about where to search:1°
In short, since minimal programs almost surely "live on" fully uncorrelated land-
scapes in program space, one comes strongly to suspect that minimal programs are
not themselves evolvable.
Analysis of the conditions of evolvability, therefore, requires understanding:
1) What kinds of systems "live on" what kinds of "fitness landscapes"; 2) what
kinds of fitness landscapes are "optimal" for adaptive evolution; and 3) whether
there may be selective or other adaptive processes in complex systems which might
"tune" 1) and 2) to achieve systems which are able to evolve well.
Organisms are the paradigm examples of complex systems which patently have
evolved, hence now do fulfill the requirements of evolvability. Despite our fascina-
tion with sequential algorithms, organisms are more adequately characterized as
complex parallel-processing dynamical systems. A single example suffices to make
this point. Each cell of a higher metazoan such as a human harbors an identi-
cal, or nearly identical, copy of the same genome. The DNA in each cell specifies
about 100,000 distinct "structural" genes, that is, those which code for a protein
product. Products of some genes regulate the activity of other genes in a complex
regulatory web which I shall call the genomic regulatory network. Different cell
types in an organism, nerve cell, muscle cell, liver hepatocyte, and so forth, differ
from one another because different subsets of genes are active in the different cell
types. Muscle cells synthesize myoglobin, red blood cells contain hemoglobin. Dur-
ing ontogeny from the zygote, genes act in parallel, synthesizing their products, and
mutually regulating one another's synthetic activities. Cell differentiation, the pro-
duction of diverse cell types from the initial zygote, is an expression of the parallel
processing on the order of 10,000 to 100,000 genes in each cell lineage. Thus the
metaphor of a "developmental program" encoded by the DNA and controlling on-
togeny is more adequately understood as pointing to a parallel-processing genomic
dynamical system whose dynamical behavior unfolds in ontogeny. Understanding
development from the zygote, and the evolution of development, hence the evolv-
ability of ontogeny, requires understanding how such parallel-processing dynamical
systems might give rise to an organism, and be molded by mutation and selection.
Other adaptive features of organisms, ranging from neural networks to the
anti-idiotype network in the immune system, are quite clearly examples of parallel-
processing networks whose dynamical behavior and changes with learning, or with
antigen exposure, constitute the "system" and exhibit its evolvability.
Requirements for Evolvability in Complex Systems 155
monomers have almost always bound an oxygen, thus further increases in oxygen
concentration do not increase the amount bound per hemoglobin molecule. The
response saturates. This means that a graph of bound oxygen concentration as a
function of oxygen tension is S-shaped, or sigmoidal, starting by increasing slowly,
becoming increasingly steep, then passing through an inflection and bending over,
and increasing more slowly again to a maximal asymptote.
Positive cooperativity and ultimate saturation in enzyme systems, cell recep-
tor systems, binding of regulatory molecules to DNA regulatory sites,1,45,52 and
other places are extremely common in biological systems. Consequently, sigmoidal
response functions are common as well.
The vital issue is to realize that even with a "soft" sigmoidal function whose
maximum slope is less than vertical, coupled systems governed by such systems are
properly idealized by on/off systems. It is easy to see intuitively why this might
be so. Consider a sigmoidal function graphed on a plane, and on the same plane
a constant, or proportional response where the output response is equal to the
input, i.e., the slope is 1.0. The sigmoidal function is initially below the proportional
response. Thus a given input leads to even less output. Were that reduced output
fed back as the next input, then the subsequent response would be even less. Over
iterations the response would dwindle to 0. Conversely, the sigmoidal response
becomes steep in its mid-range and crosses above the proportional response. An
input above this critical crossing point leads to a greater than proportional output.
In turn were that output fedback as a next input, the response would be still greater
than that input. Over iterations the response would climb to a maximal response.
That is, feedback of signals through a sigmoidal function tend to sharpen to an
all-or-none response.25,6° This is the basic reason that the "on/off" idealization of
a flip-flop in a computer captures the essence of its behavior.
In summary, logical switching systems capture major features of a homologous
class of nonlinear dynamical systems governed by sigmoidal functions because such
systems tend to sharpen their responses to external values of the variables. The
logical, or switching, networks can then capture the logical skeleton of such contin-
uous systems. However, the logical networks miss detailed features and in particu-
lar typically cannot represent the internal unstable steady states of the continuous
system. Thus Boolean networks are a caricature, but a good one, an idealization
which is very powerful, with which to think about a very broad class of continu-
ous nonlinear systems as well as switching systems in their own right. I stress that
it is now well established that switching systems are good idealizations of many
nonlinear systems.25 But just how broad the class of nonlinear systems which are
"homologous" in a useful sense to switching networks remains a large mathematical
problem.
158 Stuart A. Kauffman
F = 2(2K) . (1)
The number of possible Boolean functions increases rapidly as the number of inputs,
K, increases. For K = 2 there are (22)2 = 16 possible Boolean functions. For K = 3
there are 256 such functions. But by K = 4 the number is 216 = 24336, while for
K = 5 the number is 232 = 5.9 x 108. As we shall see, special subclasses of the
possible Boolean functions are important for the emergence of orderly collective
dynamics in large Boolean networks.
An autonomous Boolean network is specified by choosing for each binary ele-
ment which K elements will serve as its regulatory inputs, and assigning to each
binary element one of the possible Boolean functions of K inputs. If the network
has no inputs from "outside" the system, it is considered to be "autonomous." Its
behavior depends upon itself alone.
Figure 1(a) shows a Boolean network with three elements, 1, 2, and 3. Each re-
ceives inputs from the other two. 1 is governed by the AND function, 2 is governed
by the OR function, and 3 is governed by the OR function. The simplest class of
Boolean networks are synchronous. All elements update their activities at the same
moment. To do so each element examines the activities of its K inputs, consults its
Boolean function, and assumes the prescribed next state of activity. This is sum-
marized in Figure 1(b). Here I have rewritten the Boolean rules. Each of the 23
possible combinations of activities of the three elements corresponds to one state
of the entire network. Each state at one moment causes all the elements to assess
Requirements for Evolvability in Complex Systems 159
(a) (b)
T T+1
2 3 1 1 2 3 1 2 3
0 0 0 o o o o o o
0 I 0 0 o 1 o 1 o
1 0 0 0 1 o o o 1
I I 1 o i l I I I
'Mr 1 0 o o i I
1 o 1 0 I l
I 2 3 1 3 2 I 1 o 0 I I
0 0 0 0 0 0 I I I I I I
0 1 1 0 1 I
I 0 1 I 0 1
1 1 1 I 1 I
'Or 'oil*
(C) (d)
100
0 Of;D state cycle 1 , II)
110 -->00
1—> 000 state cycle 1
T
010
001c---''
.........."010 state cycle 2
W----1
011 --> 101 state cycle 2
100
I
110-0.011-911r) state cycle 3
I state cycle 3
101
FIGURE 1 (a) The wiring diagram in a Boolean network with three binary elements,
1,2,3, each an input to the other two. One element is governed by the Boolean AND
function, the other two by the OR function. (b) The Boolean rules of (a) rewritten
showing for all 23 = 8 states of the Boolean network at time T, the activity assumed by
each element at the next time moment, T + 1. Read from left to right this figure shows,
for each state, its successor state. (c) The state transition graph, or behavior field, of
the autonomous Boolean network in (a) and (b), obtained by showing state transitions
to successor states, (b), as connected by arrows, (c). This system has 3 state cycles.
Two are steady states (000) and (111), the third is a cycle with two states. Note that
(111) is stable to all single Hamming unit perturbations, e.g., to (110), (101), or (011),
while (000) is unstable to all such perturbations. (d) Effects of mutating rule of element
2 from OR to AND. From Origins of Order: Self Organization in Evolution by S. A.
Kauffman. Copyright © 1990 by Oxford University Press, Inc. Reprinted by permission.
160 Stuart A. Kauffman
the values of their regulatory inputs, and, at a clocked moment, assume the proper
next activity. Thus, at each moment, the system passes from a state to a unique
successor state.
Over a succession of moments the system passes through a succession of states,
called a trajectory. Figure 1(c) shows these successions of transitions.
The first critical feature of autonomous Boolean networks is this: since there
is a finite number of states, the system must eventually reenter a state previously
encountered; thereafter, since the system is deterministic and must always pass
from a state to the same successor state, the system will cycle repeatedly around
this state cycle. These state cycles are the dynamical attractors of the Boolean
network. The set of states flowing into one state cycle or lying on it constitute the
basin of attraction of that state cycle. The length of a state cycle is the number of
states on the cycle, and can range from 1 for a steady state to 2N.
Any such network must have at least one state cycle attractor, but may have
more than one, each draining its own basin of attraction. Further, since each state
drains into only one state cycle, the set of state cycles are the dynamical attractors
of the system, and their basins partition the 2N state space of the system.
The simple Boolean network in Figure 1(a) has three state cycle attractors, 6c.
Each is a discrete alternative recurrent asymptotic pattern of activities of the N
elements in the network. Left to its own, the system eventually settles down to one
of its state cycle attractors and remains there.
The stability of attractors to minimal perturbation may differ. A minimal per-
turbation in a Boolean network consists in transiently "flipping" the activity of an
element to the opposite state. Consider Figure 1(c). The first state cycle is a steady
state, or state cycle of length one, (000) which remains the same over time. Tran-
sient flipping of any element to the active state, e.g., (100) (010) or (001) causes the
system to move to one of the remaining two basins of attraction. Thus the (000)
state cycle attractor is unstable to any perturbations. In contrast, the third state
cycle is also a steady state (111). But it remains in the same basin of attraction for
any single perturbation (011), (101), or (110). Thus this attractor is stable to all
possible minimal perturbations.
A structural perturbation is a permanent "mutation" in the connections or
Boolean rules in the Boolean network. In Figure 1(d) I show the result of mutating
the rule governing element 2 from the OR function to the AND function. As you
can see, this alteration has not changed state cycle (000) or state cycle (111),
but has altered the second state cycle. In addition, state cycle (000) which was
an isolated state now drains a basin of attraction and is stable to all minimal
perturbation, while (111) has become an isolated state and now is unstable to all
minimal perturbations.
To summarize, the following properties of autonomous Boolean networks are of
immediate interest:
1. The number of states around a state cycle is called its length. The length can
range from 1 state for a steady state to 2N states.
Requirements for Evolvability in Complex Systems 161
2. The number of alternative state cycles. At least one must exist. But a maximum
of 2N might occur. These are the permanent asymptotic alternative behaviors
of the entire system.
3. The sizes of the basins of attraction drained by the state cycle attractors.
4. The stability of attractors to minimal perturbation, flipping any single element
to the opposite activity value.
5. The changes in dynamical attractors and basins of attraction due to mutations
in the connections or Boolean rules. These changes will underlie the character of
the adaptive landscape upon which such Boolean networks evolve by mutation
to the structure and rules of the system.
Boolean networks are discrete dynamical systems. The elements are either ac-
tive or inactive. The major difference between a continuous and a discrete deter-
ministic dynamical system is that two trajectories in a discrete system can merge.
To be concrete, Figure 1(c) shows several instances where more than one state
converge upon the same successor state.
"logic," and ask whether orderly behavior emerges nevertheless. Note that such be-
havior is occurring in a parallel-processing network. All elements compute their next
activities at the same moment. If we find order in random networks, then "random"
parallel networks with random logic has order despite an apparent cacophony of
structure and logic.
the ends of the tails. On the order of In N(N1/2) of the number of elements
lie on loops. Each separate loop has its own dynamical behavior and cannot
influence the other structurally isolated loops. Thus such a system is structurally
modular. It is comprised by separate isolated subsystems. The overall behavior
of such systems is the product of the behaviors of the isolated systems. As
Table 1 shows, the median lengths of state cycles increase rather slowly as N
increases, the number of attractors increases exponentially as N increases, and
their stability is moderate. There are four Boolean functions of K = 1 input,
"yes," "not," "true," and "false." The last two functions are constantly active,
or inactive. The values in Table 1 assume that only the Boolean functions "yes"
and "not" are utilized in K = 1 networks. When all four functions are allowed,
most isolated loops fall to fixed states, and the dynamical behavior is dominated
by those loops with no "true" or "false" functions assigned to elements of the
loop. Flyvbjerg and Kjaer16 and Jaffee26 have derived detailed results for this
analytically tractable case.
Requirements for Evolvability in Complex Systems 165
Next consider a system of several binary variables, each receiving inputs from
two or three of the other variables, and each active at the next moment if any one
of its inputs is active at the current moment, (see Figure 2). That is, each element
is governed by the OR function on its inputs. As shown in Figure 2, this small
network has feedback loops. Now the consequence of the fact that all elements are
governed by the OR function on their inputs is that if a specific element is cur-
rently in the "1" state, at the next moment all of those elements that it regulates
are guaranteed or FORCED to be in the "1" state. Thus the "1" value is guar-
anteed to propagate from any initially active element in the net, iteratively to all
"descendents" in the net. But the net has loops; thus the guaranteed "1" value
cycles around such a loop. Once the loop has "filled up" with "1" values at each
element, the loop remains in a fixed state with "1" at each element in the loop, and
cannot be perturbed by outside influences of other inputs into the loop. Further the
"fixed" "1" values propagate to all descendents of the feedback loop, fixing them in
the "1" value as well. Such circuits are called forcing loops and descendent forcing
structures.17,18,19,30,31,32,34,35,36 Note that the fixed behavior of such a part of the
OR
A X B
0 0 O
A X 0 t 1
1 0 1
\ 17 1 I i
B (OR)
C (OR )
F ( OR ) \ te----.'.. Z
D (OR)
1, (0R) \ ,-.
L_, (OR)
G \
Forcing Structure
XE X
00 0 •
LA
OII A(012) 0 0 0
10I 0 I 0
11I
E(NOT IF) 11F) 10 1
BM C 1 I 0
00 0 C 0E
O10 M--3C(AND) 0 0 1
100 01 1
1I 1 0 0
J---a-D (IF) 1 1 1
DR F
.1 C
00 0
00 1
O1 1
101 R 0 I 0
10 1
111 I t 1
network provides walls of constancy. No signal can pass through elements once they
are frozen in their forced values.
The limitation to the OR function is here made only to make the picture clear.
In Figure 3 I show a network with a forcing structure in which a 1 state at some
specific elements force a descendant element to be in the 0 state, which in turn
forces its descendent element to be in the 1 state. The key, and defining feature, of
a forcing structure in a Boolean network is that at each point, a single element has a
single state which can force a descendent element to a specific state regardless of the
activities of other inputs. Propagation of such guaranteed, or forced, states occurs
Requirements for Evolvability in Complex Systems 169
via the forcing connections in the network. For a connection between two regulated
elements to be classed as "forcing," the second element must be governed by a
canalizing Boolean function, and the first element, which is an input to the second
element, must itself directly or indirectly (i.e., via K = 1 input connections) be
governed by a canalizing Boolean function, and the value of the first element which
can be "guaranteed" must be the value of the first element which itself guarantees
the activity of the second element. Clearly a network of elements governed by the
OR function meets these requirements. More generally, they create a transitive
relation such that if A forces B and B forces C, then A indirectly forces C via B.
Guaranteed, or forced, values must propagate down a connected forcing structure.
Large networks of N switching elements, each with K = 2 inputs drawn at
random from among the N, and each assigned at random one of the (22)K Boolean
switching functions on K inputs, are random disordered systems. Nevertheless, they
can exhibit markedly ordered behavior with small attractors, with homeostasis and,
as, we see below, with highly correlated fitness landscapes. The reason for this is
that large forcing structures exist in such networks. The forcing structures form
a large connected interconnected web of components which stretches or percolates
across the entire lletWOrk.18'19'23'30'31'34'35'36'37'38 This web falls to a fixed state,
each element frozen in its forced value and leaves behind functionally isolated islands
of elements which are not part of the forcing structure. These isolated islands are
each an interconnected cluster of elements which communicates internally. But the
island clusters are functionally isolated from one another because signals cannot
pass through the walls of constancy formed by the percolating forcing structure.
The occurrence of such walls of constancy due to the percolation of extended
forcing structures depends upon the character of the switching network, and in
particular on the number of variables which are inputs to each variable, that is,
upon the connectivity of the dynamical system. Large connected forcing structures
form, or "percolate," spontaneously in K = 2 networks because a high proportion
of the 16 possible Boolean functions of K = 2 inputs belong to the special class
of "canalizing Boolean functions." If two elements regulated by canalizing Boolean
functions are coupled, one as the input to the second, then the probability that the
connection is a "forcing connection" is 0.5. This means that in a large network all
of whose elements are regulated by canalizing Boolean functions; on average half of
the connections are forcing connections.
The expected size and structure of the resulting forcing structures is a math-
ematical problem in random graph theory.11,12,22,32,33,36,37,38 Percolation "thresh-
olds" occur in random graphs and determine when large connected webs of elements
will form. Below the threshold such structures do not form, above the threshold they
do. The percolation threshold for the existence of extended forcing structures in a
random Boolean network requires that the ratio of forcing connections to elements
be 1.0 or greater 31'33'36'37'3s Thus in large networks using elements regulated by
canalizing functions on two inputs, half the 2N connections are forcing. Therefore
the ratio of forcing connections to elements, N/N = 1, is high enough that extended
large forcing structures form. More generally, for K = 2 random networks and net-
works with K > 2, but restricted to canalizing functions, such forcing structures
170 Stuart A. Kauffman
form and literally crystallize a frozen state which induces orderly dynamics in the
entire network.
Because the percolation of a frozen component also accounts for the emergence
of order due to homogeneity clusters discussed just below, I defer for a moment de-
scribing how the frozen component due to either forcing structures or homogeneity
clusters induces orderly dynamics.
Low connectivity is a sufficient, but not a necessary condition for orderly behavior in
disordered switching systems. In networks of high connectivity, order emerges with
proper constraints on the class of Boolean switching rules utilized. One sufficient
condition is constraint to the class of canalizing Boolean functions and the perco-
lation of forcing structures across the network. But another sufficient condition for
order exists.
Consider a Boolean function of four input variables. Each input can be on or
off; hence the Boolean function must specify the response of the regulated switching
element for each of the 24 combinations of values of the four inputs. Among the 16
"responses," the 1 or the 0 response might occur equiprobably, or one may occur
far more often than the other. Let P be the fraction of the 2K positions in the
function with a 1 response. If P is well above 0.5, and approaches 1.0, then most
combinations of activities of the four variables lead to a 1 response. The deviation
of P above 0.5 measures the "internal homogeneity" of the Boolean function.
In Figure 4 I show a two-dimensional lattice of points, each of which is an
on/off variable, and each of which is regulated by its four neighboring points. Each
is assigned at random one of the possible Boolean functions on four inputs, subject
to the constraint that the fraction of "1" values in that Boolean function is a
specified percentage, P, P > 0.5.
Derrida and Stauffer,6 Weisbuch and Stauffer,62 and de Arcangelis,3 summa-
rized in Stauffer57,58 and in Weisbuch,63 have studied two-dimensional and three-
dimensional lattices with nearest-neighbor coupling, and found that if P is larger
than a critical value, Pc, then the dynamical behavior of the network breaks up
into a connected "frozen" web of points fixed in the "1" value, and isolated islands
of connected points which are free to oscillate from 0 to 1 to 0, but are functionally
cut off from other such islands by the frozen web.
In contrast, if P is closer to 0.5 than Pc, then such a percolating web of points
fixed in "1" values does not form. Instead small isolated islands of frozen elements
form, and the remaining lattice is a single connected percolating web of elements
which oscillate between 1 and 0 in complex temporal cycles. In this case, transiently
altering the value, 1 or 0, of one point can propagate via neighboring points and
influence the behavior of most of the oscillating elements in the lattice.
These facts lead us to a new idea: The critical value of P, Pc, demarks a kind
of "phase transition" in the behavior of such a dynamical system. For P closer to
Requirements for Evolvability in Complex Systems 171
8 8 1 1228228228228228228228 1 1 1 1 1 1 1 1 1 1 1 1 1
8 8 1 1 1 1 1228228228228 1 1 1 1 1 1 1 1 1 1 1 1 1
8 8 8456456456228228228228228228 1 1 1 1 1 10 10 10 1 1 1 1
1 8 1 1228228228228228 1 1 1 1 1 1 1 1 10 10 10 1 1 1 1
1 1 1228228228228228228228 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1228228228228228228228 1 1 1 1 1 1 1 1 1 1 4 4
1 1 1 1 1 1 1 1228228228228 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 6 1 1228228228228 1 1 1 4 1 1 1 1 1 1 1 1 1
1 4 1 6 6 6 1 1228228228228228228 4 1 4 1 1 1 1 1 1 1
1 4 1 1 6 6 6228228228228 1 1 1 4 1 4 1 1 1 1 1 1 1
4 4 1 6 6 6 6 6228228228 1 1 1 1 1 1 1 1 1 1 1 4 4
1 4 12 6 6 6 1228228228228 1 1 1 1 1 1 8 8 8 1 1 1 4
220 1 1 1 1 1 1 1 1228228228 1 1 1 1 1 8 8 8 8 1 1220
220220 1 1 1 1 1 1 1228228228228 1 1 1 1 8 8 4 8 1 1 1
220220 1 1 1 1 1 1 1228228 1 1 1 1 1 1 1 1 1 1220110 1
1220110110 1 1 1 1 1228228 1 1 1 1 1 1 1 1 1 20 20110110
1110110110 1 1 4 1 1228 1 1 2 4 1 1 1 1 1 1 20 20110110
110110110110110 1 4 1 1 1 1 1 2 4 1 1 1 1 20 20 20 20 1110
110110110 22 1 1 1 1 1 1 4 4228 1 1 1 20 20 20 20 20 20 20110
110110 1 1 1 1 1 1 1 1 1 1228 1 4 1 20 20 20 20 20 20 20110
110 22 22 22 22 1 1228228 1 1228228 1 4 4 1 1 1 1 4 20 2 22
22 88 22 22 1 1 1 1228 1228228228 1 1 1 1 1 1 1 1 20 2 1
1 88 1 1 1228228228228228228228228 1 1 1 1 1 1 1 4 4 4 1
1 8 1 1228228228228228228228228228 1 1 1 1 1 1 1 1 1 1 1
FIGURE 4 Two-dimensional lattice of sites, each a binary state "spin" which may point
up or down. Each binary variable is coupled to its four neighbors and is governed by a
Boolean function on those four inputs. The number at each site shows the periodicity
of the site on the state cycle attractor. Thus "1" means a site frozen in the active or
inactive state. Note that frozen sites form a frozen duster that percolates across the
lattice. Increasing the measure of internal homogeniety, p, the bias in favor of a "1" or
a "0" response by any single spin, above a critical value, Pc, leads to percolation of a
"frozen" component of spins, fixed in the 1 or 0 state, which spans the lattice leaving
isolated islands of spins free to vary between 0 and 1.
0.5 than Pc, the lattice of on/off variables, or two state "spins," has no percolating
frozen component. For P closer to 1.0 than Pc, the lattice of on/off variables, or
two state "spins," does have a large frozen component which percolates across the
space.
The arguments for the percolation of a frozen component for P > Pc do not
require that the favored value of each on/off "spin" variable in the lattice be 1.
The arguments carry over perfectly if half the on/off variables responds with high
probability, P > Pc, by assuming the 1 value and the other half responds with
P > Pc with the 0 value. In this generalized case, in the frozen web of "spins"
in the lattice, each frozen spin is frozen in its more probable value, 1 or 0. Thus,
for arbitrary Boolean lattices, P > Pc provides a criterion which separates two
drastically different behaviors, chaotic versus ordered.
The value of P for which this percolation and freezing out occurs, depends
upon the kind of lattice, and increases as the number of neighbors to each point in
the lattice increases. On a square lattice for K = 4, Pc is 0.28.57,58,59 On a cubic
172 Stuart A. Kauffman
lattice, each point has six neighbors, and Pc is greater than for square lattices. This
reflects the fact that the fraction of "bonds" in a lattice which must be in a fixed
state for that fixed value to percolate across the lattice, depends upon the number
of neighbors to each point in the lattice.
Let me call such percolating frozen components for P > Pc homogeneity clus-
ters to distinguish them from extended forcing structures. I choose this name be-
cause freezing in this case depends upon the internal homogeneity of the Boolean
functions used in the network. That the two classes of objects are different in gen-
eral is clear: In forcing structures the characteristic feature is that at each point a
single value of an element alone suffices to force one or more descendent elements to
their own forced values. In contrast, homogeneity clusters are more general. Thus,
consider two pairs of elements, Al, A2, B1 and B2. Al and A2 might receive inputs
from both B1 and B2 as well as other elements, while B1 and B2 receive inputs
from Al and A2 as well as other elements. But due to the high internal homogeneity,
P > Pc of the Boolean functions assigned to each, simultaneous .1 values by both
Al and A2 might jointly guarantee that B1 and B2 each be active regardless of the
activities of other inputs to B1 and B2. At the same time, simultaneous 1 values
by both B1 and B2 might jointly guarantee that Al and A2 be active regardless
of the activities of other inputs to Al and A2. Once the four elements are jointly
active, they mutually guarantee their continued activity regardless of the behavior
of other inputs to the four. They form a frozen component. Yet it is not a forcing
component since the activity of two elements, Al and A2, or B1 and B2, must be
jointly assured to guarantee the activity of any single element.
While there appear to be certain differences between forcing structures and
homogeneity clusters, those differences are far less important than the fact that,
at present, the two are the only established means to obtain orderly dynamics in
large, disordered Boolean networks.
Whether percolation of a frozen phase is due to an extended forcing structure
or to a homogeneity cluster due to P > Pc, the implications include these:
1. If a frozen phase does not form:
a. The attractors in such a system are very large, and grow exponentially as
the number of points in the lattice, N, increases. Indeed, the attractors are
so large that the system can be said to behave chaotically.
b. As indicated, a minor alteration in the state of the lattice, say, "flipping"
one element from the 1 to the 0 value at a given instant, propagates al-
terations in behavior throughout the system. More precisely, consider two
identical lattices which differ only in the value of one "spin" at a moment,
T. Let each two lattices behave dynamically according to their identical
Boolean rules. Define the "damage" caused by the initial "spin flip" to be
the total number of sites in the lattices which at the succession of time mo-
ments are now induced to be in different states, 1 or 0. Then for P closer to
0.5 than Pc, such damage propagates across the lattice with a finite speed,
and a large fraction of the sites are damaged.6,48,56,57,58,62 Propagation of
Requirements for Evolvability in Complex Systems 173
neighbor of all those which alter the beginning or end of a single "input" connection,
or a single "bit" in a single Boolean function.
In considering program space I defined a fitness landscape as the distribution
over the space of the figure of merit, consisting in a measurable property of those
programs. This leads us to examine the statistical features of such fitness landscapes,
including its correlation structure, the numbers of local optima, the lengths of walks
to optima via fitter 1-mutant variants, the number of optima accessible from any
point, and so forth. Similarly, in considering adaptation in Boolean network space,
any specific measurable property of such networks yields a fitness landscape over
the space of systems. Again we can ask what the structure of such landscapes looks
like.
I shall choose to define the fitness of a Boolean network in terms of a steady
target pattern of activity and inactivity among the N elements of the network. This
target is the (arbitrary) goal of adaptation. Any network has a finite number of
state cycle attractors. I shall define the fitness of any specific network by the match
of the target pittern to the closest state on any of the net's state cycles. A perfect
match yields a normalized fitness of 1.0. More generally, the fitness is the fraction
of the N which match the target pattern.
In previous work, Kauffman and Levin" studied adaptive evolution on fully
uncorrelated landscapes. More recently,44,43 my colleagues and I introduced and
discussed a spin-glass-like family of rugged landscapes called the NK model. In this
family, each site, or spin, in a system of N sites, makes a fitness contribution which
depends upon that site, and upon K other randomly chosen sites. Each site has
two alternative states, 1 or 0. The fitness contribution of each site is assigned at
random from the uniform distribution between 0.0 and 1.0 for each combination of
the 2K+1 states of the K + 1 sites which bear on that site. The fitness of a given
configuration of N site values, (110100010), is defined as the mean of the fitness
contributions of each of the sites. Thus, this model is a kind of K-spin spin glass,
in which an analogue of the energy of each spin configuration depends, at each
site, on interactions with K other sites. In this model, when K = 0, the landscape
has a single peak, the global optimum, and the landscape is smoothly correlated.
When K is N —1, each site interacts with all sites, and the fitness landscape is fully
random. This limit corresponds to Derrida's random-energy spin glass mode1.4'5'6
Two major regimes exist, K proportional to N, and K of order 1. In the former,
landscapes are extremely rugged, and local optima fall toward the mean of the space
as N increases. In the latter, there are many optima, but they do not fall toward
the mean of the space as N increases. For K = 2, the highest optima cluster near
one another.
Such rugged landscapes exhibit a number of general properties. Among them,
there is a "universal law" for long jump adaptation. In "long jump" adaptation
members of an adapting population can mitate a large number of genes at once,
hence jump a long way across the landscape at once. Frame-shift mutations are
examples. In long jump adaptation the waiting time to find fitter variants doubles
after each fitter variant is found, hence the mean number of improvement steps,
S, grows as the logarithm base 2 of the number of generations. Further, there is
Requirements for Evolvability in Complex Systems 177
(b)
N.100
C.100
Generation at whichImprovement step occurs
i•
'
• THEORETICAL MEAN
°OBSERVED MEAN
FIGURE 5 (a) Tests of the "Universal Law" for long jump adaptation. Figures show
cumulative number of improvement steps following mutations of half the connections
in K = 2, N = 50 and N = 100 element Boolean nets in each member of the
population—except for a "current best" place holder—plotted against the logarithm of
the generation at which the improvement occurred. Each walk yields a sequence of
generations at which an improvment step arose. Mean of observed values are plotted,
as well as theoretical expectations. (b) 1/4 of all "bits" in the Boolean functions within
each member of the population of N = 50 or N = 100 networks were reversed at
each generation as a "long jump" mutation in the logic of the network. From Origins
of Order: Self Organization in Evolution by S. A. Kauffman. Copyright © 1990 by
Oxford University Press, Inc. Reprinted by permission.
K"2 Fa5
Ka 2 Fs2
u
9 0.8
K"2 Fa I
Ku2 Fa2 I
IT_ 0.7
0.6
0.5
0.4 • • • •
0 10 20 30 40 50 60 70 80 90 100
GENERATION
N a 50 13'50
(b)
1.0
0.9
Kg2 Ca5
0.6 • e
0.5
0. ' 4 . •
40 10 30 30 40 50 60 70 80 90 100
GENERATION
N"50 P"50
(a)
1.0
09
co)
(s) 0.8
s— K.I0 F=5
II 0 7
K=I0 f.2
06 -
K.10 F.I
0.5
04
0 10 20 30 40 50 60 70 80 90 100
GENERATION
01, 10 P=50
(b)
O
K=I0 C=5
0.9
K.10 C=2
cn 0 8
U)
la
K.10 C=I
0.7
06
05
04
0 10 20 30 40 50 60 70 80 90 100
GENERATION
N.I0 P.50
FIGURE 7 (a) As in Figure 5, except that K = 10. (b) Same as (a), except that the
connections were mutated in networks. From Origins of Order: Self Organization
in Evolution by S. A. Kauffman. Copyright C) 1990 by Oxford University Press, Inc.
Reprinted by permission.
Requirements for Evolvability in Complex Systems 183
(a)
1203 47 10 0 6 3 1 2 0 2 7 0 0 2 0 o o o 0 0
0.71 0.46
K .2 F=1
1144 23 0 1 15 11 12 0 9 1 1 0 7 2 7 0 0 0 0 01
0.603 0.500
Kg2 Fg2
141 37 II 10 11 13 3 10 15 11 L7 17 10 10 11 3 11 3 3 71
01111 0.511
K22 Fg5
(b)
1203 22 a 0 2 2 3 1 2 6 0 01010010 01
0.700 0.400
If92 C=1
104 34 34 I 16 13 • 1 • 2 0 0 1 3 1 0 0 0 0 01
0.1100 Kg2 Cg2 0.300
146 33 22 20 1 21 22 4 16 11 • 19 2 6 4 3 2 4 2 111
0.600 Kg2 C.5 0.560
(C)
pm 0 0 25 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 01
0.600 0300
Kg10 Fgl
1105 0 0 67 2 0 0 1 0 0 3 0 72 0 0 0 0 0 0 01
0.625 K=10 Fg2 0325
(d)
loge 0 1 6 3 3 35 3 2 3 7 1 5 7 3 3 0 0 0 01
0.700 0.400
Kg10 Cgl
150 0 o 0 0 le 0 0 0 0 0 0 0 2 1 0 2 0 3 OA
0.900 K.10 C.2 0.100
119 0 0 0 0 0 0 0 0 0 S 0 0 5 0 2 2 2 2 201
0500 1Cg10 Ca5 0.600
FIGURE 8 The fitnesses of 1, 2, and 5 mutant variants of the fittest network found
after adaptive hill climbing in K = 2 networks (a-b), or K = 10 networks (c-d). ''F"
refers to mutations to 'bits" in Boolean functions, "C" to mutations of connections. From
Origins of Order: Self Organization in Evolution by S. A. Kauffman. Copyright @
1990 by Oxford University Press, Inc. Reprinted by permission.
184 Stuart A. Kauffman
the complexity of the entities under selection increases, here the number of binary
switching variables in a disordered Boolean network, the attainable optima again
fall toward the mean of the space. We do not know at this stage just how general this
complexity catastrophe limiting the power of selection when operating on complex
systems may be, but it appears likely to be a powerful factor in evolution. Finally,
Boolean networks of different connectivities, K = 2 and K = 10, clearly adapt
on radically different landscapes. The capacity to attain and maintain high fitness
depends upon landscape structure, mutation rate, and coevolutionary couplings of
landscapes. It follows that dynamical systems in different classes, constructed in
different broad ways, can have very different capacities to adapt. Tentatively, it
appears that Boolean nets of low connectivity are likely to adapt more readily than
those of high connectivity.
Among the themes to be investigated in understanding the relation between
self-organization and selection, is the extent to which selection can achieve systems
whose behavior is very untypical of those in the ensemble in which adaptive evolu-
tion is occurring. In the current context, can selection operate on Boolean networks
with K = 20 inputs and N = 10,000, and achieve networks with short stable at-
tractors? The answer is unknown. But since the generic properties of this class of
random Boolean networks includes attractors which scale exponentially in N, and
are grossly unstable to minimal perturbations, one doubts strongly that selection
could achieve such systems within the N = 10,000 K = 20 ensemble. But, if the
structure of such networks governs the sizes and stability of their attractors, it
also governs the ruggedness of the fitness landscapes upon which they evolve. If
selection can "tune" K in such networks, or bias the choice of Boolean functions in
such networks, then selection can change the ensemble being explored by evolution.
Such changes would tune the landscape structure of the systems, hence their evolv-
ability. The fact that the K = 2 and canalizing ensemble fits so many features of
organisms, and that organisms are themselves now clearly evolvable, suggests that
this ensemble may itself have been achieved by selection in part to achieve evolv-
ability. In the next section we turn to ask what features of fitness landscapes and
the couplings between landscapes such that landscapes deform as partners adapt,
abet coevolution.
moves by one coevolutionary partner causes the fitness landscapes of its partners
to deform more or less drastically. It is a story of coupled dancing landscapes. On a
fixed fitness landscape there is the analogue of a potential function: the fitness at
each point. In coevolution, no such potential function is present. Thus we can frame
the following questions: 1) How are fitness landscapes coupled? 2) What kinds of
couplings between landscapes allows the partners to dance happily and typically
achieve "high fitness?" 3) Might there be evolutionary processes which alter the
couplings among landscapes and the landscape structure of each partner, such that
the entire system coevolves "well" or optimally in some sense?
Answers are not known, of course. I describe briefly some preliminary work
carried out with my colleague Sonke Johnsen using the spin-glass-like NK model of
fitness landscapes. As noted briefly above, the NK model consists of N spins, each
in two states, 1 or 0. Each spin makes a "fitness contribution" to the "organism"
which depends upon the value at that spin site, and at K other randomly chosen
sites. In our coevolutionary model, we consider a system with S organisms, one for
each of S species. Each species interacts with R neighboring species. Each site in
each species makes a fitness contribution which depends upon K sites within that
species member, and "C" sites in each of the R species with which it interacts. The
fitness contribution of each site therefore depends upon K +1+ R* S sites, each of
which can be in the 1 or 0 state. The model assigns to each site a fitness contribution
at random from the uniform interval between 0.0 and 1.0, for each of the 2K+1+R
combinations of these site values. In an extension to the model, each species also
interacts with an external world of N sites, of which W affect each of the species
own sites. Thus, the coevolutionary model is a kind of coupled spin system. Each
species is represented by a collection of N spins. Spins are K coupled within each
species, and C coupled between species. The fitness of any species, whose current
state is given by the values of its N spins, depends upon the states of those spins
and those in its R neighbors which impinge upon it, and is the mean of the fitness
contribution of the species' own N sites.
Consider a "square" 10 x 10 ecosystem with 100 species each of which interacts
with its four neighbors. Corner species interact with only two neighbors, edge species
interact with three neighbors. Each species "plays" in turn by flipping each of its
N spins, one at a time, and ascertaining if any 1-mutant variant is fitter than the
current spin configuration of that species. If so, the species randomly chooses one of
the fitter variants and "moves" there. Each of the 100 players plays in turn, in order.
100 plays constitutes an ecosystem generation. After each ecosystem generation, a
species may have changed spin configuration, or may not have changed. If the
species changed, color it blue. If it remained fixed, color it red. Over time the
system will continue to change unless all members stop changing, and the whole
system becomes frozen in a "red" state. Such a state corresponds to a local Nash
equilibrium in game theory. Each player is at a local (1-mutant) optimum consistent
with the local optima of its R neighbors.
Recall that increasing K increases the ruggedness of these NK landscapes.
We find the following remarkable result: When K is large relative to R * C, then
over ecosystem generations frozen red regions form, grow, and percolate across the
Requirements for Evolvability in Complex Systems 187
ecosystem. At first these red frozen components leave behind blue islands of species
which continue to undergo coevolutionary change. Eventually, the entire system
becomes frozen in a red Nash equilibrium. In short, frozen components recur on
this larger scale of coupled spin systems, in direct analogy with those found in
Boolean networks. The number of ecosystem generations required for the frozen
component to spread across the ecosystem increases dramatically when K is less
than R * C.
Tuning the parameters of the coupled spin model, N, K, C, S, and the number
of sites which can "flip" or mutate at once in each species, not only tunes the
mean time to reach a Nash equilibrium, but also tunes the mean fitness of the
coevolving partners. While full results are not yet available, it appears than in any
model ecosystem, there is an optimal value of K. When K is too small relative to
R * C, the landscape of each partner is too "smooth," and the effects of altering a
site internal to a species upon its fitness is too small with respect to the impact
of site alterations in other species to withstand those exogenous perturbations to
landscape structure. The waiting time to reach the frozen Nash equilibrium is long,
and sustained fitness is low. Conversely, if K is too high, Nash equilibria are rapidly
attained, but the high K value implies many conflicting constraints; thus the fitness
of the local optima which comprise the Nash are low. Again, sustained fitness is
low. An optimal value of K optimizes the waiting time to find Nash equilibria such
that the sustained fitness is itself optimized.
It is also important that an evolutionary process guided by natural selection
acting on members of individual species may lead partners to "tune" K to the opti-
mum. For each partner in a system where each has a suboptimal or overoptimal K
value, any single partner improves its own sustained fitness by increasing or lowering
its K value toward the optimal value. Thus natural selection, acting on members
of individual species to tune the ruggedness of their own fitness landscapes, may
optimize coevolution for an entire coupled system of interacting adapting species.
Real coevolution confronts not only adaptive moves by coevolving partners, but
exogenous changes in the external "world" impinging upon each partner. The cou-
pled NK landscape model suggests that if each partner is occasionally shocked by
a change in its external world, then sustained fitness may be optimized by increas-
ing K slightly. In this case, the coevolving system as a whole tends to restore the
red frozen Nash equilibria more rapidly in the face of external perturbations which
destabilize the system.
Finally, it has been of interest to study the distribution of coevolutionary
avalanches unleashed by changing the external "world" of species when the entire
system is at a frozen Nash equilibrium. Small and large avalanches of coevolution-
ary change propagate across the system. To a first approximation, when the K
value is optimized to maximize sustained fitness, the distribution of avalanche sizes
appears to be linear in a log-log plot, suggesting a power law distribution. If so,
the self-optimized ecosystem may harbor a self-organized critical state of the kind
recently investigated by Bake in other contexts. Interestingly, the distribution of
such avalanches in these model ecosystems mirrors the distribution of extinction
events in the evolutionary record
188 Stuart A. Kauffman
These results are first hints that coevolving systems may tune the structure of
their internal landscapes and the coupling between landscapes under the aegis of
natural selection such that the coupled system coadapts well as a whole. No mean
result this, if true.
SUMMARY
What kinds of dynamical systems harbor the capacity to accumulate useful varia-
tions, hence evolve? How do such systems interact with their "worlds" in the sense
of categorizing their worlds, act upon those categorizations, and evolve as their
worlds with other players themselves evolve? No one knows. The following is clear.
Adaptive evolution, whether by mutation and selection, or learning, or otherwise,
occurs on some kind of "fitness landscape." This follows because adaptation or
learning is some kind of local search in a large space of possibilities. Further, in
any coevolutionary context, fitness landscapes deform because they are coupled.
The structure and couplings among landscapes reflect the kinds of entities which
are evolving and their couplings. Natural selection or learning may tune both such
structures and couplings to achieve systems which are evolvable.
A further point is clear. Complex, parallel-processing Boolean networks which
are disordered can exhibit ordered behavior. Such networks are reasonable models
of a large class of nonlinear dynamical systems. The attractors of such networks
are natural objects of interest. In the present article I have interpreted attractors
as "cell types." But equally, consider a Boolean network receiving inputs from an
external world. The attractors of a network are the natural classifications that the
network makes of the external world. Thus, if the world can be in a single state, yet
the network can fall to different attractors, then the network can categorize that
state of the world in alternative ways and respond in alternative ways to a single
fixed state of the external world. Alternatively, if the world can be in alternative
states, yet the network fall to the same attractor, then the network categorizes
the alternative states of the world as identical, and can respond in the same way.
In brief, and inevitably, nonlinear dynamical systems which interact with external
worlds classify and "know" their worlds.
Linking what we have discussed, and guessing ahead, I suspect that if we could
find natural ways to model coevolution among Boolean networks which received in-
puts from one another and external worlds, we would find that such systems tuned
their internal structures and couplings to one another so as to optimize something
like their evolvability. An intuitive bet is that such systems would achieve internal
structures in which the frozen components were nearly melted. Such structures live
on the edge of chaos, in the "liquid" interface suggested by Langton,49 where com-
plex computation can be achieved. In addition, I would bet that couplings among
entities would be tuned such that the red frozen Nash equilibria are tenuously held
to optimize fitness of all coevolving partners in the face of exogenous perturbations
Requirements for Evolvability in Complex Systems 189
REFERENCES
1. Alberts, A., D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson. Molec-
ular Biology of the Cell. New York: Garland, 1983.
2. Bak, P., C. Tank, and K. Wiesenfeld. "Self-Organized Criticality." Phy. Rev.
A. 38(1) (1988):364-374.
3. De Arcangelis, L. "Fractal Dimensions in Three-Dimensional Kauffman Cellu-
lar Automata." J. Phys. A. Lett. 20 (1987):L369-L373.
4. Derrida, B., and H. Flyvbjerg. "Multivalley Structure in Kauffman's Model:
Analogy with Spin Glasses." J. Phys. A: Math. Gen. 19 (1986):L1003-L1008.
5. Derrida, B., and Y. Pomeau. "Random Networks of Automata: A Simple An-
nealed Approximation." Biophys. Lett. 1(2) (1986):45-49.
6. Derrida, B., and D. Stauffer. "Phase-Transitions in Two-Dimensional Kauff-
man Cellular Automata." Europhys. Lett. 2(10) (1986):739-745
7. Derrida, B., and H. Flyvbjerg. "The Random Map Model: A Disordered
Model with Deterministic Dynamics." J. Physique 48 (1987):971-978.
8. Derrida, B., and H. Flyvbjerg. "Distribution of Local Magnetizations in Ran-
dom Networks of Automata." J. Phys. A. Lett. 20 (1987):L1107-L1112.
9. Eigen, M. " New Concepts for Dealing With the Evolution of Nucleic Acids."
In Cold Spring Harbor Symposia on Quantitative Biology, vol. LII. Cold
Spring Harbor Laboratory, 1987, 307-320.
10. Eigen, M., and P. Schuster. The Hypervycle, A Principle of Natural Self-
Organization. New York: Springer-Verlag, 1979.
11. Erdos, P., and A. Renyi. On the Random Graphs 1, vol. 6. Debrecar, Hun-
gary: Inst. Math. Univ. Debreceniens, 1959.
12. Erdos, P., and A. Renyi. "On the Evolution of Random Graphs." Math. Inst.
Hung. Acad. Sci., Publ. No. 5, 1960.
13. Farmer, J. D., K. S. Kauffman, and N. H. Packard. "Autocatalytic Replica-
tion of Polymers." Physica 22D (1986):50-67.
14: Farmer, J. D., N. H. Packard, and A. Perelson. "The Immune System, Adap-
tation, and Machine Learning." Physica 22D (1986):187-204.
15. Feller, W. Introduction to Probability Theory and its Applications, vol. II, 2nd
edition. New York: Wiley, 1971.
16. Flyvberg, H., and N. J. Kjaer. "Exact Solution of Kauffman's Model with
Connectivity One." J. Phys. A. 21(7) (1988):1695-1718.
190 Stuart A. Kauffman
54. Rununelhart, D. E., J. L. McClelland, and the PDP research group. Parallel
Distributed Processing: Explorations in the Microstructure of Cognition, vols.
I and H. Cambridge, MA: Bradford, 1986.
55. Schuster, P. "Structure and Dynamics of Replication-Mutation Systems."
Physica Scripts 26B (1987):27-41.
56. Stanley, H. E., D. Stauffer, J. Kertesz, and H. J. Herrmann. Phys. Rev. Lett.
59 1987.
57. Stauffer, D. "Random Boolean Networks: Analogy with Percolation." Philo-
sophical Magazine B 56(6) (1987):901-916.
58. Stauffer, D. "On Forcing Functions in Kauffman's Random Boolean Net-
works." J. Stat. Phys. 40 (1987):789.
59. Stauffer, D. "Percolation Thresholds in Square-Lattice Kauffman Model." J.
Theor. Biol., in press.
60. Walter, C., R. Parker, and M. Ycas. J. Theor. Biol. 15 (1967):208.
61. Weishbuch, G. J. Phys. 48 (1987):11.
62. Weishbuch, G., and D. Stauffer. "Phase Transition in Cellular Random
Boolean Nets." J. Physique 48 (1987):11-18.
63. Weishbuch, G. "Dynamics of Complex Systems. An Introduction of
Networks of Automata." Paris: Interedition, 1989.
64. Wolfram, S. "Statistical Mechanics of Cellular Automata." Rev. Mod. Phys.
55 (1983):601.
Seth Lloyd
Division of Physics and Astronomy, California Institute of Technology, Pasadena, CA 91125
and the Santa Fe Institute, 1120 Canyon Road, Santa Fe, NM 87501
Valuable Information
QUANTIFYING EXPERIENCE
Consider how the set of genes for a particular species of clover changes from one
season to the next. Next season's genes are made up from this season's genes by a
process of mutation and recombination. Given this spring's gene pool, next spring's
gene pool may be any one of a large number of gene sets. Which genes are repre-
sented next spring depends not only on the details of reproduction, but on which
individual plants in this year's crop are most attractive to pollen-bearing bees, on
the local densities of clover population, on rainfall, etc. In short, next season's gene
pool depends on a wide variety of factors, ranging from the relative viability of
individuals within the species to dumb luck.
Although the vagaries of bees, weather, and chance are beyond our control,
information theory allows us to put a measure to the amount of selection that
they effect. Given mutation rates and the rules of recombination, one can calculate
the a priori probabilities for different genetic combinations in the next generation
of clover, given the genetic make-up of the present generation. This probability
distribution has a certain entropy, call it S(next I present). Weather, chance and
bees conspire to pick out a particular set of clover genes for the next generation. In
doing so, they supply the species with an amount of information equal to S(next
present).
The total amount of information supplied to the species over n generations
can be quantified as follows. Let xi, x2, • • - label the possible genetic configurations
of the first generation, second generation, etc. Let p(xi) be the probability that
the actual configuration of genes of the first generation is x1. Let p(x2 x1) be
the a priori probability that the second generation has configuration x2 given that
the first generation had configuration xi . Then p(xi x2) = p(xi)p(x2 I xi) is the
probability of the sequence of configurations xix2 over the first two generations.
Define p(x3 I xi x2), Xxix2x3) = P(xix2)P(x3 I xi x2), etc., in a similar fashion.
The information supplied by the environment in picking out the actual genetic
configuration, xi, of the first generation is then Si = — Es, p(xi) log p(xi). The
Valuable Information 195
amount of information supplied in picking out the actual configuration of the second
generation, x2 is .921x1 = - E22 P(X2 1 xi) log p(x2 I xi), and so on. The total
amount of information supplied to the species by the environment over n generations
is then
Sure = Si + S21,1 + • • • + Snixii z'2 •••x'va -1 •
The explicit expression for Stot in terms of probabilities is not very illumi-
nating. To get a more suggestive form for the information supplied, we turn to
coding theory. Suppose that we want to associate each possible genetic config-
uration with a binary number—that is, we want to encode the influence of the
environment on the species. Coding theory implies that in the most efficient self-
delimiting encoding (Huffman code),4 the length of the message associated with
the genetic configuration xi is - - log2 p(4). Similarly, given that the first config-
uration was xi, the length of the message that codes for x2 is - - log2 p(x; 1 4) =
- log2 p(Xj. xi) + log2 p(4).
In the most efficient coding, the sum of the lengths of the messages that encode
for 4, x2, • • • , xin at each stage is then
which is simply the length of the measure that encodes for the trajectory
zl, x2, • • • , en in the most efficient coding over genetic trajectories. Define Czl ...e.
= - log p(4 - - • 4) to be the cost of the trajectory 4 • • • x,c.
In the case of clover, Czi ...en measures the amount of information supplied to
the species by the environment over n generations. But cost can be applied to any
other system for which there exists a natural probability distribution over the set
of the system's different possible trajectories. Consider, for example, the cost of
computation. Suppose that one assigns equal probability to all sequences of zeros
and ones as programs for a binary computer.2 The probability that the first m bits
of such a randomly selected program happen to match a given m-bit program is 2-m,
and the cost of the trajectory that the program encodes is C,,,, = - log 2' = m.
The cost of executing a given program is simply equal to its length.
Cost as defined is a function of process. To assign a cost to a particular piece of
information, one identifies the various processes that can result in that information.
Each such process, g, has a cost, Cs! = - logp(g). The information's cost can
either be identified with with the average cost of the processes that result in it,
C = - Eg p(g) log p(g), or with its minimum cost, which is just the cost of the
most likely such process. If the piece of information is a number and the processes
the various computations that result in that number then the number's minimum
cost is just its algorithmic complexity—the length of the shortest algorithm that
produces that number as output.
196 Seth Lloyd
A MODEL
Consider a hypothetical two-sexed, monogamous species with N members, in which
each couple mates once each year, resulting in two offspring. There are N/2 mem-
bers of each sex, and (N/2)! different possible combinations of couples. Half of each
offspring's genes come from the mother, half from the father, with recombination
occurring at M sites, for a total of Ml/(M/2)!2 possible genetic combinations for
each offspring, given the genetic make-up of the parents. The total amount of in-
formation involved in picking out one combination of couples and one combination
of genes for each offspring is thus
2
loge [ (2;-.
1 )! !) ) ( LT ) log (-1-V—) + NM ,
2 2 2e
by Stirling's formula.
If N is a million and M is on the order of 30, the cost per generation is roughly
40 million bits, with the amounts of information that goes into mate selection and
into recombination of comparable magnitudes.
The amount of information required to specify the effects of mutation can easily
be included in this mode. If p bits are required to describe the positions and types
of mutations in a given individual per generation on average, then NP bits are
added to the total cost. If the location of each recombination site can vary by q
base pairs along the gene, then M log2 q bits are added to the cost.
Valuable Information 197
PROSPECTUS
Cost is a measure of the amount of information required by a process. Unless one
adopts a labor theory of value that ignores demand, the cost of a process that
produces a piece of information does not equal its value. Information's value should
depend on demand as well as supply—value should reflect usefulness. In addition,
cost does not capture distinctions between different sorts of information: as defined,
cost gives equal weight both to random information supplied by mutation and to
ordered information supplied by selective pressure. A more comprehensive measure
might discount random information, while retaining the contribution of ordered
information from the environment.?
Nevertheless, cost is the obvious measure of the total amount of information
that needs to be supplied to a system in the course of a given process. As con-
firmation, cost reduces to program length when specialized to computation. And,
although the cost of a piece of information may not determine its value, the genetic
cost of the evolution of a species can be tens of millions of bits of non-random
information per generation, or much more. One might be careful of bringing such
a species to extinction, in the event that one must pay for what one breaks.
ACKNOWLEDGMENTS
Work supported in part by the U.S. Department of Energy under Contract No.
DE-AC0381-ER40050
198 Seth Lloyd
REFERENCES
1. Adams, J. C. Manuscript Nos. 1841-1846, St. John's College Library, Cain-
bridge University.
2. Bennett, C. H. Intl. J. Theor. Phys. 21 (1982):905.
3. Frautschi, S. Science 217 (1982):593.
4. Hamming, R. W. Coding and Information Theory. Englewood Cliffs: Prentice
Hall, 1986.
5. Le Verrier, W. C. J. C. R. Acad. Sci. 21 (1845):1050.
6. Lloyd, S., and H. R. Pagels. Ann. Phys. 188 (1988):186.
7. Lloyd, S., and H. R. Pagels. To be published.
8. Shannon, C. E., and W. Weaver. The Mathematical Theory of Communica-
tion. Urbana: University of Illinois Press, 1949.
Dilip K. Kondepudi
Department of Chemistry, Box 7486, Wake Forest University, Winston-Salem, NC 27109
INTRODUCTION
In polymers, nature realizes information-carrying sequences in a simple way. In the
last three decades, molecular biology has revealed to us how information is carried
in DNA sequences and how this information is translated into proteins that have a
definite function. We have many details of these awesome, complex processes, but
we have only a poor understanding of how and when such an information-processing
systems will spontaneously evolve. The questions regarding the spontaneous evo-
lution of information-processing systems are more general than the question of the
origin of life; they are questions regarding the origin of "complexity." Though com-
plexity does not have a precise physical meaning, we can describe some aspects
of it in terms of algorithmic information, especially in the case of self-organizing
polymer systems.
As will be shown below, thermodynamic quantities such as entropy and free
energy do not characterize all the essential features of complexity. We need new
physical quantities, perhaps quantities such as algorithmic information. In the con-
text of polymers, algorithmic information can be associated with a particular poly-
mer sequence and an algorithm can be associated with a catalyst that produces this
sequence. I would also like to point out that a physical significance of algorithmic
In the equilibrium state (A), all sequences appear with equal probability. In
the nonequilibrium states (B) and (C), only the sequences R — R — R — R — R and
R—S—R—R—S respectively appear.
A nonequilibrium state such as (B) or (C) has lower entropy compared to the
equilibrium state (A). Since there are 25 possible species, the difference in entropy
AS = nRln(25), in which n is the number of moles and R is the gas constant. At
temperature T, the difference in Helmholtz free energy AF = M.S. The immediate
consequence of this is that "weights can be lifted" through the transformation of
the nonequilibrium state to the equilibrium state. The amount of useful work that
can be obtained is equal to AF.
This can be done by using van't Hoff's boxes and a suitable membrane as shown
in Figure 2. Here we assume that the system is an ideal gas. The scheme consists
of two chambers, A and B, separated by a membrane that is permeable only to
polymers of a particular sequence, such as R— R— R— R— R. The pressure and
volume of chamber A are PA and VA and for chamber B, they are PB and V.•
The entire set up is in contact with a heat reservoir which is at a temperature T.
The volumes of the two chambers can be altered by moving pistons as shown in the
Figure 2. In chamber B, there is no catalyst that converts R to S so that a nonequi-
librium distribution remains as it is. In chamber A, however, there is a catalyst and
hence the sequence of the polymers entering this chamber will transform to other
sequences. Thus the number of species in chamber B increases as molecules enter
it from chamber A. The partial pressure of the polymer R — R — R — R — R in
chamber A will equal the pressure, PB, in B. Since the total number of species is
25, the pressure in chamber A, PA = 25 x PB. If initially VA = 0 and VB = VB0, then
by slowly and simultaneously moving the pistons in such a way that PA and PB
are maintained constant ( at their equilibrium value) all the molecules can be forced
Membrane CATALYST
R-S-R-R-S
R-R-R-R-R R-R-R-S-S
S-S-R-R-S
R-R-R-R-R
B A
CATALYST
Membrane A Membrane 8
R-S-R-R-S
R-R-R-S-S
S-S-R-R-S R-S-R-R-S
R-R-P-R-R
M R
from chamber B to chamber A. At the end of this process VA will equal (VB0/25 ).
Weights can now be lifted by allowing the gas in A to expand to the initial volume
VB0- It can easily be seen that the amount of useful work that can be obtained is
AF =TRnln(25)
Clearly, through a similar conversion of the nonequilibrium state (C) (shown
in Figure 1) to the equilibrium state (A), the same amount of useful work can be
obtained. In this way we see that a physical consequence of a nonequilibrium state
is that: "weights can be lifted."
Turning now to the algorithmic-information point of view, we can distinguish
nonequilibrium states (such as (B) and (C) shown in Figure 1) on the basis of
algorithmic information: the algorithm required to generate the sequence (B) is
surely shorter than the algorithm for (C). However, this difference does not have a
simple physical consequence such as "lifting weights." That no "lifting of weights"
can be accomplished through the conversion of state (C) to state (B) follows from
the existence of the process, represented in Figure 3, in which the state (B) is
converted to state (C) without any expenditure of free energy. In Figure 3, the
membrane separating the chambers L and M is permeable only to the molecule
R — R — R — R — R while the membrane separating the chambers M and R is
permeable only to R — S — R— R— S. In the central chamber, M, the polymers are
in a state of equilibrium due to the presence of a catalyst. Since the mole fractions
of all the different species are equal, the pressures in chambers L and R will be
equal. Hence by moving the pistons in the directions indicated in the figure, one
species can be converted to another with no expenditure of energy.
Note that from a computational point of view the sequence R—R—R—R—R has
been converted reversibly to the sequence R — S — R — R — S with no dissipation
of energy. This is one way of realizing the general observation made by Charles
Bennettl that computation can be done reversibly, in the sense of Carnot. Thus,
by inserting and removing appropriate membranes at the appropriate time, any
Quantum Mechanics 203
R s R R
R R S
S S
R S R
R-R-R-R-R
/
R-S-R-R-S
Equilibrium State
REFERENCES
1. Bennett, C. H. "The Thermodynamics of Computation—A Review." Intl J.
of Theor. Phys. 21 (1982):905-940.
2. Spiegelman, S. "An in Vitro Analysis of a Replicating Molecule." Amer. Sci.
55 (1967):221-264.
3. Zurek, W. H. "Algorithmic Randomness and Physical Entropy." Phys. Rev.
A40 (1989):4731-4751.
4. Zurek, W. H. "Algorithmic Information Content, Church-Turing Thesis,
Physical Entropy, and Maxwell's Demon? This volume.
Tad Hogg
Xerox Palo Alto Research Center, Palo Alto, CA 94304
INTRODUCTION
Distributed problem solving is a pervasive and effective strategy in situations re-
quiring adaptive responses to a changing environment. As shown by the examples of
the scientific community, social organizations, the economy, and biological ecosys-
tems, a collection of interacting agents individually trying to solve a problem using
different techniques can significantly enhance the performance of the system as
a whole. This observation also applies to computational problems, such as traffic
control, acting in the physical world, and interpreting real-time multi-sensor data,
where the emergence of computer networks and massively parallel machines has
enabled the use of many concurrent processes.
These tasks generally require adaptability to unexpected events, dealing with
imperfect and conflicting information from many sources, and acting before all rele-
vant information is available. In particular, incorrect information can arise not only
from hardware limitations but also from computations using probabilistic methods,
heuristics, rules with many exceptions, or learning resulting in overgeneralization.
Similarly, delays in receiving needed information can be due to the time required
to fully interpret signals in addition to physical communication delays.
Directly addressing problems with these characteristics usually involves tech-
niques whose resource requirements (e.g., computer time) grow exponentially with
the size of the problem. While such techniques thus have high computational
complexity,16 more sophisticated approaches, employing various heuristic tech-
niques, can often overcome this prohibitive cost. Heuristics are effective in many
interesting real-world computation problems because of the high degree to which
experience on similar problems, or subtasks, can be generalized and transferred to
new instances. Moreover, there are often many alternate approaches to each prob-
lem, each of which works well in circumstances that are difficult to characterize a
priori. It is thus of interest to examine the behavior of collections of interacting
processes or agents which solve such problems, abstracted away from detailed is-
sues of particular algorithms, implementations, or hardware. These processes must
make decisions based upon local, imperfect, delayed, and conflicting information
received from other agents reporting on their partial success towards completion of
a goal. Such characteristics, also found in social and biological communities, lead
us to refer to these collections as computational ecosystems.9
A general issue in studying such systems is to characterize their complexity
and relate it to their behavior. While a number of formal complexity measures have
been proposed, one appropriate for describing the performance of computational
systems should capture the observation that both ordered and random problems can
be addressed with relatively simple techniques. In particular, explicit algorithms are
effective when there are a limited number of contingencies to handle. Similarly, sta-
tistical techniques are useful where there are limited dependences among variables
and the relevant conditional probabilities can easily be estimated from the data.
To consider the more interesting, and more difficult, computational problems de-
scribed above, we focus on complexity measures, such as diversity, that assign high
values to intermediate situations x,6,7 This is in contrast to conventional measures
of algorithmic randomness4,11,18 which are primarily concerned with the minimal
program required to reproduce a given result rather than with the difficulty of
devising programs to effectively solve particular problems.
The Dynamics of Complex Computational Systems 209
example concerns a search for a good, but not necessarily optimal, state in a lim-
ited amount of time. Throughout these examples we show how the existence of a
diverse society of processes is required to achieve this performance enhancement.
We thus obtain a connection between a measure of the system's complexity and its
performance.
CONCURRENT SEARCH
We consider the case of heuristically guided search,14 which applies to a wide range
of problems. A search procedure can be thought of as a process which examines a
series of states until a particular goal state is obtained. These states typically rep-
resent various potential solutions of a problem, usually obtained through a series of
choices. Various constraints on the choices can be employed to exclude undesirable
states. Examples range from well-defined problem spaces as in chess to problems in
the physical world such as robot navigation.
As a specific example, consider the case of a d-dimensional vector, each of whose
components can take b different values. The search consists of attempting to find a
particular suitable value (or goal) among the bd possible states. It is thus a simple
instance of constrained search involving the assignment of values to components of
a vector subject to a number of constraints. A random search through the space
will, on average, find the goal only after examining one half of the possibilities,
an extremely slow process for large problems (i.e., the required time is exponential
in d, the number of components to be selected). Other specific approaches can be
thought of as defining an order in which the possible states are examined, with
the ensuing performance characterized by where in this sequence of states the goal
appears. We now suppose that n agents or processes are cooperating on the solution
of this problem, using a variety of heuristics and that the problem is completed by
the first agent to find the solution. The heuristic used by agent i can be simply
characterized by the fraction f between 0 and 1, of unproductive states that it
examines before reaching the goal. A perfect heuristic will thus correspond to fi = 0
and one which chooses at random has fi = 1/2.
In addition to their own search effort, the agents exchange information regard-
ing the likely location of the goal state within the space. In terms of the sequence
of states examined by a particular agent, the effect of good hints is to move the
goal toward the beginning of the sequence by eliminating from consideration states
that would otherwise have to be examined. A simple way to characterize a hint is
by the fraction of unproductive notes, that would have otherwise been examined
before reaching the goal, that the hint removes from the search. Since hints need
not always be correctly interpreted, they can also lead to an increase in the actual
number of nodes examined before the answer is found. For such cases, we suppose
that the increase, on average, is still proportional to the amount of work remaining,
i.e., bad hints won't cause the agent to nearly start over when it is already near
The Dynamics of Complex Computational Systems 211
the goal but will instead only cause it to reintroduce a small number of additional
possibilities. Note that the effectiveness of hints depends not only on the validity
of their information, but also on the ability of recipients to interpret and use them
effectively. In particular, the effect of the same hint sent to two different agents can
be very different.
A simple example of this characterization of hint effectiveness is given by a
concurrent search by many processes. Suppose there are a number of characteris-
tics of the states that are important (such as gender, citation, and subfield in a
database). Then a particular hint specifying gender, say, would eliminate one half
of all remaining states in a process that is not explicitly examining gender.
To the extent that the fractions of unproductive nodes pruned by the various
hints are independent, the fraction of nodes that an agent i will have to consider is
given by
fi = fi nitial f hint (1)
J
:70i
where fr nit is the fraction of nodes eliminated by the hint that the agent i receives
from agent j, and finitial characterizes the performance of the agent's initial heuris-
tic. Note that hints which are very noisy or uninterpretable by the agent correspond
to a fraction equal to one because they do not lead to any pruning on the average.
Conversely, a perfect hint would directly specify the goal and make h equal to zero.
Furthermore, we should note that since hints will generally arrive over time during
the search, the fractions characterizing the hints are interpreted as effective values
for each agent, i.e., a good hint received late, or not utilized, will have a small effect
and a corresponding hint fraction near one.
The assumption of independence relies on the fact that the agents broadcast
hints that are not overlapping, i.e., the pruning of two hints won't be correlated.
This will happen whenever the agents are diverse enough so as to have different
procedures for their own searchers. If the agents were all similar, i.e., the pruning
was the same for all of them, the product in Eq. (1) would effectively only have one
factor. For intermediate cases, the product would only include those agents which
differ from each other in the whole population. As an additional consideration, the
overall heuristic effectiveness fi must not exceed one, so there is a limit to the
number of independent hint fractions larger than one that can appear in Eq. (1).
We therefore define neff to be the effective number of diverse agents, which in turn
defines the actual number of terms in the product of Eq. (1). This leads to a direct
dependence of the pruning effectiveness on the diversity of the system. Although
the hints that individual agents find useful need not come from the same sources,
for simplicity,we suppose the number of diverse hints received by each agent is the
same.
We now derive the law that regulates the pruning effectiveness among agents.
By taking logarithms in Eq. (1), one obtains
where we have included only terms arising from diverse hints. If the individual
distributions of the logarithms of the fractions satisfy the weak condition of hav-
ing a finite variance, and if the number of hints is large, then the central limit
theorem applies. Therefore, the values of log fi for the various agents will be nor-
mally distributed around its mean, with standard deviation o, i.e., according to
N(p, a, log fi). Here p and cr2 are the mean and variance of the log fi of the various
agents, which are given by the sum of the corresponding moments of the individual
terms in the sum. In other words, f itself is distributed according to the lognormal
distribution)
8.
6.
p 4.
2.
1
3. 4. 5. 6.
S
This can be viewed as an extension of the previous example in that successive levels
of the tree correspond to choices for successive components of the desired vector,
with the leaves of the tree corresponding to fully specified vectors. The additional
tree structure becomes relevant when the heuristic can evaluate choices based on
vectors with some components unspecified. These evaluations offer the possibility
of eliminating large groups of nodes at once.
The search proceeds by starting at the root and recursively choosing which
nodes to examine at successively deeper levels of the tree. At each node of the tree
there is one correct choice, in which the search gets one step closer to the goal.
All other choices lead away from the goal. The heuristic used by each agent can
then be characterized by how many choices are made at a particular note before
the correct one is reached. The perfect heuristic would choose correctly the first
time, and would find the goal in d time steps, whereas the worst one would choose
the correct choice last, and hence be worse than random selection. To characterize
an agent's heuristic, we assume that each incorrect choice has a probability p of
being chosen by the heuristic before the correct one. Thus the perfect heuristic
corresponds to p = 0, random to p = 0.5, and worst to p = 1. For simplicity, we
suppose the heuristic effectiveness, as measured by p, is uniform throughout the
tree. Alternatively, p can be thought of as the value of the effectiveness averaged
over all nodes in the tree. In the latter case, any particular correlations between
nodes are ignored, in the spirit of a mean-field theory, which can be expected to
apply quite well in large-scale problems. Note that while p specifies the fraction of
incorrect choices made before the correct one on average throughout the tree, this
probabilistic description allows for variation among the nodes.
The a posteriori effect of hints received from other agents can be described as
a modification to an agent's value of p. Assuming independence among the hints
received, this probability is given by
rtes!
initial TT ;hint
(6)
-
Pi j j*i
5=1
where prtial characterizes the agent's initial heuristic and the hint fractions are the
same as introduced in the previous section, but now averaged over the entire tree.
By supposing the various quantities appearing in Eq. (6) are random variables,
The Dynamics of Complex Computational Systems 215
we again obtain the universal lognormal distribution (over the set of agents) of
heuristic effectiveness when there are a large number of agents exchanging hints.
Given this distribution of local decision effectiveness, we now need the distri-
bution of performance in the full search problem, i.e., the rate at which the search
for the goal is completed. This relationship is more complex than in the unstruc-
tured example considered above, and in particular it produces a phase transitions
in overall agent performance at a critical value of p. This sharp transition leads to
the possibility of an additional enhancement in performance.
Specifically, the overall performance is related to the time T, or number of
steps, required to reach the goal from the root of the tree. To quantify the search
performance, we consider the search speed given by
number of nodes in the tree Ntotal
S= (7 )
number of steps to the goal
(p — p)(d — p — dp + ul+i)
(T) — d+ (8)
(P -1)2
where µ = bp. As the depth of the tree increases, this becomes increasingly singular
around the value p = 1, indicating a sudden transition from linear to exponential
search. This is illustrated in Figure 2 which shows the behavior of s = d/(T) as a
function of the local decision effectiveness characterized by p. Near the transition,
a small change in the local effectiveness of the heuristic has a major impact on
the global behavior of large-scale search problems. The existence of such a phase
transition implies that, in spite of the fact that the average behavior of cooperative
algorithms may be far into the exponential regime, the appearance of an extended
tail in performance makes it possible for a few agents to solve the problem in
polynomial time. In such a case, one obtains a dramatic improvement in overall
system performance by combining these two effects. We should note that other
search topologies such as general graphs also exhibit these phase transitions3 so
these results can apply to a wide range of topologies found in large-scale search
problems.
216 Tad Hogg
FIGURE 2 Plot of vs. local decision effectiveness for trees with branching ratio 5
and depths 10, 20 and 100. The distinction between the linear regime (p < 0.2) and
the exponential one becomes increasingly sharp as the depth increases. The dashed
curve is the omit for an infinitely deep tree and shows the abrupt change at p = 0.2
from linear to exponential search.
Finally, to illustrate the result of combining diverse hint with the phase transi-
tion in tree searchers, we evaluate the distribution of relative global speed s for the
agents searching in a tree with a branching ratio b = 5 and depth d = 20. This com-
bines the distribution of local decision effectiveness with its relation to global speed.
As in the previous example, we suppose hints on average neither help nor hinder the
agents. In particular, we take the fri ' values to be normally distributed according
to N(1, 0.015,f). We also take the initial performance of the agents (i.e.,
to be normally distributed according to N(0.33, 0.0056, p) which corresponds to a
bit better than random search. The resulting distributions were evaluated through
simulations of the search process and are compared in Figure 3, on a logarithmic
scale to emphasize the extended tails.
In this case, the enhancement of the global performance of the system is most
dramatic at the higher end of the distribution, not all of which is shown in the
figure. In this example, the top 0.1 percentile agents will have an enhancement of
global speed over the case of no hints by factors of 2 and 41 for 10 and 100 hints
respectively. This illustrates the nonlinear relation between performance, number
of agents, and diversity of hints.
The Dynamics of Complex Computational Systems 217
5
10
SATISFICING SEARCHES
In many heuristic search problems, the exponential growth of the search time with
problem size forces one to accept a satisfactory answer rather than an optimal one.
In such a case, the search returns the best result found in a fixed amount of time
rather than continuing until the optimal value is found. To the extent that such
returned results have high value, they can provide acceptable solutions to the search
problem without the cost involved in obtaining the true optimum. A well-known
instance is the traveling salesman problem, consisting of a collection of cities and
distances between them and an attempt to find the shortest path which visits each
of them. The time required to find this path grows exponentially with the number
of cities. For large instances of the problem, one must settle instead for paths that
are reasonably short, compared to the length of an average path, but not optimal.
In these cases of limited search time, the extended tails of the cooperative
distributions discussed above result in a better value returned compared to cases in
218 Tad Hogg
which hints are not used. To see this we consider an unstructured search problem
where the states have various values, v, which we take to be integers between 0
and some maximum V. In the previous examples, one could view the single goal
as having the maximum value while all other states have a value of 0. To allow
for the possible usefulness of nonoptimal states, we suppose that their values are
distributed throughout the range. In order that a simple random search is unlikely
to be effective, we need relatively few states with high value. A simple distribution
of values satisfying these requirements is given by the binomial distribution:
V
= kv 3v —v (9)
(
where my is the number of states with value v. Note that this has exactly one state
with the maximum value and most states have smaller values clustered around the
average V/4.
For problems of this kind, the effectiveness of a heuristic is determined by how
well it can discriminate between states of high and low value. When faced with
selecting among states with a range of values, a good heuristic will tend to pick
those states with high value. That is, the likelihood of selecting a state will increase
with its value. Moreover, this increase will become more rapid as the heuristic
improves. As a concrete example, we suppose that the heuristics used by the various
agents in the search are characterized by a discrimination parameter a such that
states with value v are selected by the heuristic with relative probability ay. Large
values of a provide excellent discrimination while a = 1 corresponds to random
selections. In terms of our previous examples, in which only the goal had a nonzero
value, the relative selection probabilities were 1 for the goal and p for all other
states. Thus we see that this characterization of heuristic discrimination identifies
aV with 1/p in the case of only two distinct values. As in the previous examples,
cooperation among diverse agents leads to a lognormal distribution of selection
probability values among the agents. Here this means the a values will themselves
be lognormally distributed.
Instead of focusing on the time required to find the best answer, we examine the
distribution of values returned by the various agents in a given interval of time. As
an extreme contrast with the previous examples, which continued until the goal was
found, we allow each agent to examine only one state, selected using the heuristic.
The value returned by the agent will then correspond to this state. (If additional
time were available, the agents would continue to select according to their heuristic
and return the maximum value found.) These simplifications can be used to obtain
the distribution of returned values resulting from interactions among the agents as
a function of the number of diverse agents, nefp
The Dynamics of Complex Computational Systems 219
Since all points are available to be selected, the probability that an agent op-
erating with a heuristic discrimination level of a will select a state with value v
is
V\ (a/3)3
p(a, v) — v MC& = ( (10)
Eu=o rnu at' v j (1+ a/3)v
To finally obtain the distribution of values returned by the agents, this must be
integrated over the distribution of a values. When hints are exchanged, this pa-
rameter will be distributed lognormally with a mean p and standard deviation a
depending on the corresponding values for the hint fractions. The result can be
written as
V vii+Kva)2 /2] ico die—(t2/2) 0. + ert+vcr21-crt) —V
(11)
P(v) = 1 Tr (v)e -CO
where ;I = p — ln(3).
The distributions are compared in Figure 4 for the case in which the initial
agents' heuristic has a = 1.5 (i.e., a bit better than random value discrimination)
and the hint fractions are distributed according to N(1, 0.05), again giving a case
in which the hints, on average, neither help nor hinder the search. In this case, the
top 0.1 percentile level is at a value v = 52 when neff = 10 and v = 70 when
neff = 100. This compares with the noninteracting case in which this performance
level is at v = 48.
220 Tad Hogg
CONCLUSION
The effectiveness of the hints exchanged among the agents discussed in the previ-
ous sections, depended critically on how independently they were able to prune the
search space. At one extreme, when all the agents use the same technique, the hints
will not provide any additional pruning. Similarly, if the various agents randomly
search through the space and only report the nodes which they have already ex-
amined, this will not significantly help the other agents. More specifically, highly
structured problems can be rapidly addressed by relatively simple direct algorithms.
Although various processes may run in parallel, the structure will allow the decom-
position to be such that each agent provides an independently needed part of the
answer. This would give no possibility of (and no need for) improvement with hints.
On the other hand, in highly disordered problems, each part of the search space will
be unrelated to other parts, giving little or no possibility of transfering experience
among the agents, and hence exponentially long solution times.
Many interesting problems, such as those requiring adaptive response to the
physical world or finding entries in large databases relevant to various users, are
intermediate in nature. Although simple direct algorithms do not exist, these prob-
lems nevertheless have a large degree of redundancy thus enabling the transfer of
results between different parts of the search. It is just this redundancy which al-
lows for the existence of effective heuristics and various techniques which can be
exploited by collections of cooperative processes. In particular, the fact that most
constraints in a search involve only a few of the choices at a time gives an effective
locality to most of the interactions among allowed choices. More fundamentally, this
characteristic can be viewed as a consequence of intermediate stable states required
for the design or evolution of the systems dealt with in these problems.17
The examples considered in the previous sections have shown quantitatively
how systems which effectively deal with this class of problems (e.g., economic and
biological communities as well as distributed computer systems currently under
development) can benefit from the exchange of hints. For topological structures
with sharp phase transitions in behavior, performance can be further enhanced
when the exchange of hints allow even a few agents to reach the transition point.
In summary, this provides a connection between complexity measures and actual
performance for interacting computational processes.
There remain a number of interesting open issues. The examples presented
above ignored the fact that hints will actually arrive over time, presumably im-
proving as other agents spend more time in their individual searches. On the other
hand, the usefulness of the hints to the recipient process could decline as it pro-
gresses with its own search, filling in specific details of a solution. Thus, in more
realistic models, the hint pruning fractions frnit will depend on the current state
of agents i and j, giving rise to a range of dynamical behaviors. In addition, the
examples neglected any variation in cost (in terms of computer time) with the de-
gree to which hints were effectively used. More generally, the effectiveness of a hint
could depend on how much time is spent constructing it (e.g., presenting it in a
The Dynamics of Complex Computational Systems 221
most general context where it is more likely to be applicable) and analyzing it. Such
variation could be particularly important in satisficing searches where additional
time devoted to improving hints means fewer states can be examined. Finally, over
longer time scales, as the system is applied to a range of similar problems, there is
the possibility of increased diversity among the agents as they record those strate-
gies and hints which proved most effective. Thus, in addition to showing that the
most diverse systems are best able to address difficult problems, this opens the pos-
sibility of studying the gradual development of specialized agents and the resulting
improvement in performance.
ACKNOWLEDGMENTS
During the course of this work, I have benefited from many conversations with B.
Huberman, J. Kephart, and S. Stornetta.
222 Tad Hogg
REFERENCES
1. Aitchison, J., and J. A. C. Brown. The Log-Normal Distribution. Cambridge:
Cambridge Univ. Press, 1957.
2. Bennett, C. H. "Dissipation, Information, Computational Complexity and the
• Definition of Organization." In Emerging Syntheses in Science, edited by D.
Pines. Santa Fe, NM: Santa Fe Institute, 1986, 297-313.
3. Bollobas, B. Random Graphs. New York: Academic Press, 1985.
4. Chaitin, G. "Randomness and Mathematical Proof." Sci. Am. 232 (1975):47-
52.
5. Crow, Edwin L., and Kunio Shimizu, editors. Lognormal Distributions: The-
ory and Applications. New York: Marcel Dekker, 1988.
6. Crutchfield, J. P., and K. Young. "Inferring Statistical Complexity." Phys.
Rev. Lett. 63 (1989):105-108.
7. Huberman, B. A., and T. Hogg. "Complexity "and Adaptation." Physica 22D
(1986):376-384.
8. Huberman, B. A., and T. Hogg. "Phase Transitions in Artificial Intelligence
Systems." Artificial Intelligence 33 (1987):155-171.
9. Huberman, Bernardo A., and Tad Hogg. "The Behavior of Computational
Ecologies." In The Ecology of Computation, edited by B. A. Huberman.
Amsterdam: North Holland, 1988, 77-115.
10. Kephart, J. 0., T. Hogg, and B. A. Huberman. "Dynamics of Computational
Ecosystems." Phys. Rev. A 40 (1989):404-421.
11. Kolmogorov, A. N. "Three Approaches to the Quantitative Definition of Ran-
domness." Prob. of Info. Thins. 1 (1965):1-7.
12. Krebs, C. J. Ecology. New York: Harper and Row, 1972.
13. Montroll, E. W., and M. R. Shlesinger. "On 1/f Noise and Other Distribu-
tions with Long Tails." Proc. Natl. Acad. Sci. (USA) 79 (1982):3380-3383.
14. Pearl, J. Heuristics: Intelligent Search Strategies for Computer Problem Solv-
ing. Reading, MA: Addison-Wesley, 1984.
15. Schockley, W. "On the Statistics of Individual Variations of Productivity in
Research Laboratories." Proc. of the IRE 45 (1957):279-290.
16. Sedgewick, R. Algorithms. New York: Addison-Wesley, 1983.
17. Simon, H. The Sciences of the Artificial. Cambridge, MA: MIT Press, 1962.
18. Solomonoff, R. "A Formal Theory of Inductive Inference." Info. 6 Control 7
(1964):1-22.
James P. Crutchfieldt and Karl Youngtt
f Physics Department, University of California, Berkeley CA 94720; 1: permanent address:
Physics Board of Studies, University of California, Santa Cruz, CA 950641]
Mit is not an idle speculation to wonder what happens to Einstein's universe if his dock contains
an irreducible element of randomness, or more realistically, if it is chaotic.
Computation at the Onset of Chaos 225
various hard-to-solve, but easily verified, problems. This is the class of nondeter-
ministic polynomial (NP) problems. If one can guess the correct answer, it can be
verified as such in polynomial time. The equivalence between NP problems, called
NP-completeness, requires that within a polynomial number of TM steps a prob-
lem can be reduced to one hardest problem.34 The invariant of this polynomial-time
reduction equivalence is the growth rate, as a function of problem size, of the com-
putation required to solve the problem. This growth rate is called the algorithmic
complexity.151
The complementarity between these two endeavors can be made more explicit
when both are focused on the single problem of modeling chaotic dynamical systems.
Ergodic theory is seen to classify complicated behavior in terms of information
production properties, e.g., via the metric entropy. Computation theory describes
the same behavior via the intrinsic amount of computation that is performed by
the dynamical system. This is quantified in terms of machine size (memory) and
the number of machine steps to reproduce behavior .(6) It turns out, as explained in
more detail below, that this type of algorithmic measure of complexity is equivalent
to entropy. As a remedy to this we introduce a complexity measure based on BTMs
that is actually complementary to the entropy.
The emphasis in the following is that the tools of each field are complemen-
tary and both approaches are necessary to completely describe physical complexity.
The basic result is that if one is careful to restrict the class of computational mod-
els assumed to be the least powerful necessary to capture behavior, then much of
the abstract theory of computation and complexity can be constructively imple-
mentedil From this viewpoint, phase transitions in physical systems are seen to
support high levels of computation. And conversely, computers are seen to be phys-
ical systems designed with a subset of "critical" degrees of freedom that support
computational fluctuations.
The discussion has a top-down organization with three major parts. The first,
consisting of this section and the next, introduces the motivations and general
formalism of applying computational ideas to modeling dynamical systems. The
second part develops the basic tools of e-machine reconstruction and a statistical
mechanical description of the machines themselves. The third part applies the tools
to the particular class of complex behavior seen in cascade transitions to chaos. A
few words on further applications conclude the presentation.
Min fact, the invariant actually used is a much coarsened version of the algorithmic complexity:
a polynomial time reduction is required only to preserve the exponential character of solving a
hard problem.
[81 We note that computation theory also allows one to formalize how much effort is required to
infer a dynamical system from observed data. Although related, this is not our present concern.27
vim the highest computation level of universal Turing machines, descriptions of physical com-
plexity are simply not constructive since finding the minimal TM program for a given problem is
undecidable in genera1.38
Computation at the Onset of Chaos 227
CONDITIONAL COMPLEXITY
The basic concept of complexity that allows for dynamical systems and computation
theories to be profitably linked relies on a generalized notion of structure that we
will refer to generically as "symmetry." In addition to repetitive structure, we also
consider statistical regularity to be one example of symmetry. The idea is that a
data set is complex if it is the composite of many symmetries.
To connect back to the preceding discussion, we take as two basic dynamical
symmetries those represented by the model basis {Bt,Pt}. A complex process will
have, at the very least, some nontrivial combination of these components. Simply
predictable behavior and purely random behavior will not be complex. The corre-
sponding complexity spectrum is schematically illustrated in Figure 1.
More formally, we define the conditional complexity C(DIS) to be the amount
of information in equivalence classes induced by the symmetry S in the data D plus
the amount of data that is "unexplained" by S. If we had some way of enumerating
all symmetries, then the absolute complexity C(D) would be
inf C(DIS).
C(D) = {s
}
And we would say that an object is complex if, after reduction, it is its own sym-
metry. In that case, there are no symmetries in the object, other than itself.181 If
D is the best model of itself, then there is no unexplained data, but the model is
large: C(DID) cc length(D). Conversely, if there is no model, then all of the data
MOr, said another way, the complex object is only described by a large number of equivalence
classes induced by inappropriate symmetries. The latter can be illustrated by considering an
inappropriate description of a simple object. A square wave signal is infinitely complex with
respect to a Fourier basis. But this is not an intrinsic property of square waves, only of the choice
of model basis. There is a model basis that gives a very simple description of a square wave.
228 Complexity, Entropy, and the Physics of Information
PiThis computational framework for modeling also applies, in principle, to estimating symbolic
equations of motion from noisy continuous data.21 Generally, minimization is an application of
Occam's Razor in which the description is considered to be a "theory" explaining the data.42
Rissanen's minimum-description-length principle, the coding theoretic version of this philosophical
axiom, yields asymptotically optimal representations.61'62
110 In information theoretic terms we are requiring stationarity and ergodicity of the source.
[111We are necessarily skipping over a number of details, such as how the state rt is discretized
into a string over a finite alphabet. The basic point made here has been emphasized some time
ago .8,17
Computation at the Onset of Chaos 229
situations they measure the same dynamical property captured by the informa-
tion theoretic phrase "entropy." "Complexity" shall refer to conditional complexity
with respect to BTM computational models. We could qualify it further by using
"physical complexity," but this is somewhat misleading since it applies equally well
outside of physics.421
We are not aware of any means of enumerating the space of symmetries and so
the above definition of absolute complexity, while of theoretical interest, is of little
immediate application. Nonetheless, we can posit that symmetries S be effectively
computable in order to be relevant to scientific investigation. According to the
physical variant of the Church-Turing thesis then, S can be implemented on a
BTM. Which is to say that as far as realizability is concerned, the unifying class of
symmetries that we have in mind is represented by operations of a BTM. Although
the mathematical specification for a BTM is small, its range of computation is vast
and at least as large as the underlying UTM. It is, in fact, unnecessarily powerful
so that many questions, such as finding a minimal program for given data, are
undecidable and many quantities, such as the conditional complexity C(DIBTM),
are noncomputable. More to the point, adopting too general a computational model
results in there being little to say about a wide range of physical processes.
Practical measures of complexity are based on lower levels of Chomsky's com-
putational hierarchy.1131 Indeed, Thring machines appear only at the pinnacle of
this graded hierarchy. The following concentrates on deterministic finite automata
(DFA) and stack automata (SA) complexity, the lowest two levels in the hierarchy.
DFAs represent strictly clock and coin-flip modeling. SAs are DFAs augmented by
an infinite memory with restricted push-down stack access. We will demonstrate
how DFA models break down at a chaotic phase transition and how higher levels
of computational model arise naturally. Estimating complexity types beyond SAs,
such as linear bounded automata (LBA), is fraught with certain intriguing difficul-
ties and will not be attempted here. Nonetheless, setting the problem context as
broadly as we have just done is useful to indicate the eventual goals that we have
in mind and to contrast the present approach to other long-standing proposals that
UTMs are the appropriate framework with which to describe the complexity of
natural processes.(14) Even with the restriction to Chomsky's lower levels, a good
deal of progress can be made since, as will become clear, contemporary statistical
mechanics is largely associated with DFA modeling.
1123This definition of complexity and its basic properties as represented in Figure 1 were presented
by the first author at the International Workshop on "Dimensions and Entropies in Chaotic
Systems," Pecos, New Mexico, 11-16 September 1985.
1131Further development of this topic is given elsewhere 28'29
[141We have in mind Kolmogorov's work48 over many years that often emphasizes dynamical and
physical aspects of this problem. Also, Bennett's notion of "logical depth" and his analysis of phys-
ical processes typically employ UTM models.5 Wolfram's suggestion" that the computational
properties of intractability and undecidability will play an important role in future theoretical
physics assumes UTMs as the model basis. More recently, Zurek" has taken up UTM descrip-
tions of thermodynamic processes. The information metric used there was also developed from a
conditional complexity.24
230 Complexity, Entropy, and the Physics of Information
RECONSTRUCTING &MACHINES
To effectively measure intrinsic computational properties of a physical system we
infer an &machine from a data stream obtained via a measuring instrument.25 An
&machine is a stochastic automaton of the minimal computational power yielding
a finite description of the data stream. Minimality is essential. It restricts the scope
of properties detected in the &machine to be no larger than those possessed by the
underlying physical system. We will assume that the data stream is governed by a
stationary measure. That is, the probabilities of fixed length blocks of measurements
exist and are time-translation invariant.
The goal, then, is to reconstruct from a given physical process a computation-
ally equivalent machine. The reconstruction technique, discussed in the following,
is quite general and applies directly to the modeling task for forecasting temporal
or spatio-temporal data series. The resulting minimal machine's structure indicates
the inherent information processing, i.e., transmission and computation, of the orig-
inal physical process. The associated complexity measure quantifies the &machine's
informational size; in one limit, it is the logarithm of the number of machine states.
The machine's states are associated with historical contexts, called morphs, that are
optimal for forecasting. Although the simplest (topological) representation of an &
machine at the lowest computational level (DFAs) is in the form of labeled directed
graphs, the full development captures the probabilistic (metric) properties of the
data stream. Our complexity measure unifies a number of disparate attempts to de-
scribe the information processing of nonlinear physical systems 4'6,17,19,21,35,59,65,70
The following two sections develop the reconstruction method for the machines and
their statistical mechanics.
The initial task of inferring automata from observed data falls under the
purview of grammatical inference within formal learning theory.22 The inference
technique uses a particular choice S of symmetry that is appropriate to forecasting
the data stream in order to estimate the conditional complexity C(DIS). The aim is
to infer generalized "states" in the data stream that are optimal for forecasting. We
will identify these states with measurement sequences giving rise to the same set of
possible future sequences .P51 Using the temporal translation invariance guaranteed
by stationarity, we identify these states using a sliding window that advances one
measurement at a time through the sequence. This leads to the second step in the
inference technique, the construction of a parse tree for the measurement sequence
probability distribution. This is a coarse-grained representation of the underlying
process' measure in orbit space. The state identification requirement then leads to
an equivalence relation on the parse tree. The machine states correspond to the
induced equivalence classes; the state transitions, to the observed transitions in the
tree between the classes. We now give a more formal development of the inference
method.
[15] We note that the same construction can be done for past possibilities. We shall discuss this
alternative elsewhere.
Computation at the Onset of Chaos 231
The first step is to obtain a data stream. The main modeling ansatz is that the
underlying process is governed by a noisy discrete-time dynamical system
in+ = 14(in) + M
I161We ignore for brevity's sake the question of extracting from a single component {xt,i} an
adequate reconstructed state space.57
f171The picture here is that a particular L-cylinder is a name for that bundle of orbits {in} each
of which visited the sequence of partition elements indexed by the L-cylinder.
232 Complexity, Entropy, and the Physics of Information
This gives a hierarchical approximation of the measure in orbit space lItimeM. Tree
representations of data streams are closely related to the hierarchical algorithm
used for estimating dynamical entropies.17,26
At the lowest computational level c-machines are represented by a class of la-
beled, directed multigraph, or 1-digraphs.3° They are related to the Shannon graphs
of information theory,63 to Weiss's sofic systems in symbolic dynamics,33 to discrete
finite automata in computation theory,38 and to regular languages in Chomsky's
hierarchy.11 Here we are concerned with probabilistic versions of these. Their topo-
logical structure is described by an 1-digraph G = {V, E} that consists of vertices
V = {vi} and directed edges E = {e1} connecting them, each of the latter is labeled
by a symbol s E A.
To reconstruct a topological c-machine we define an equivalence relation, sub-
tree similarity, denoted on the nodes of the tree T by the condition that the
L-subtrees are identical:
where p(tiv; s) is the transition probability from vertex v to v' along an edge la-
beled with symbol s, p(siv) is the probability that s is emitted on leaving v, and
pt, is the probability of vertex v. A deterministically accepting E-machine is recon-
structible from L-level equivalence classes if 'GL vanishes. Finite indeterminacy, at
some given {L, e, r, b}, indicates a residual amount of extrinsic noise at that level
of approximation. In this case, the optimal machine in a set of machines consistent
with the data is the smallest that minimizes the indeterminacy.27
A = {i : i = 0, , k — 1; k = 0(e-m)}
for the vertices vi E V. We will distinguish two subsets of vertices. The first Vt
consists of those associated with transient states; the second Vr, consists of recurrent
states.The a-order total Renyi entropy,6° or "free information," of the measurement
sequence up to n-cylinders is given by
Za(n) = E ea 1°gP('')
snE{sa}
with the probabilities p(s") defined on the n-cylinders {s"}. The Renyi specific
entropy, i.e., entropy per measurement, is approximated17 from the n-cylinder dis-
tribution by
ha(n) =n-l Ha (n)
or lea(n) = Ha(n)— Ha(n — 1)
and is given asymptotically by
ha = lin' ha(n) .
n-.co
The parameter a has several interpretations, all of interest in the present con-
text. From the physical point of view, a (= 1 — 13) plays the role of the inverse
temperature /3 in the statistical mechanics of spin systems.39 The spin states corre-
spond to measurements and a configuration of spins on a spatial lattice to a tempo-
ral sequence of measurements. Just as the temperature increases the probability of
different spin configurations by increasing the number of available states, a accen-
tuates different subsets of measurement sequences in the asymptotic distribution.
From the point of view of Bayesian inference, a is a Lagrange multiplier specifying
a maximum entropy distribution consistent with the maximum likelihood distribu-
tion of observed cylinder probabilities.41 Following symbolic dynamics terminology,
a = 0 will be referred to as the topological or counting case; a = 1, as the metric
or probabilistic case or high temperature limit. Varying a moves continuously from
topological to metric machines. Originally in his studies of generalized information
measures, Renyi introduced a as just this type of interpolation parameter and noted
that the a-entropy has the character of a Laplace transform of a distribution 60 Here
there is the somewhat pragmatic, and possibly more important, requirement for a:
it gives the proper algebra of trajectories in orbit space. That is, a is necessary
for computing measurement sequence probabilities from the stochastic connection
matrix Ta. Without it, products of Ta fail to distinguish distinct sequences.
An e-machine's structure determines several key quantities. The first is the
stochastic DFA measure of complexity. The a-order graph complexity is defined as
Ca = (1 — a)'1 log E Pv
Computation at the Onset of Chaos 235
f„ = {pc: : v E V}
A complexity based on the asymptotic edge probabilities fie = {p, : e E E} can also
be defined
= (1 — a)' log E
p: .
eEE
fe is given by the left eigenvector of the &machine's edge graph. The transition
complexity GI is simply related to the entropy and graph complexity by
C:=C„+1z.
There are, thus, only two independent quantities for a finite DFA &machine.27
The two limits for a mentioned above warrant explicit discussion. For the first,
topological case (a = 0), To is the 1-digraph's connection matrix. The Renyi entropy
ho = log Ao is the topological entropy h. And the graph complexity is
This is C(sIDFA): the size of the minimal DFA description, or "program," required
to produce sequences in the observed measurement language of which s is a member.
This topological complexity counts all of the reconstructed states. It is similar to
the regular language complexity developed for cellular-automaton-generated spatial
patterns." The DFAs in that case were constructed from known equations of motion
and an assumed neighborhood template. Another related topological complexity
236 Complexity, Entropy, and the Physics of Information
counts just the recurrent states V,.. The distinction between this and Co should be
clear from the context in which they are used in later sections.
In the second, metric case (a = 1), ha becomes the metric entropy
dAa
= lim = —
da •
The metric complexity
Cp =
a--•1
=— E p„ log pe,
veV
is the Shannon information contained in the morphs X18) Following the preceding re-
marks, the metric entropy is also given directly in terms of the stochastic connection
matrix
ht, =E E
p, p(t, s) log Aviv';
vEV viev
• EA
2c..(L)
cc, =lim
L—.00 L
vanishes, then the noisy dynamical system has been identified. If it does not vanish,
then cc, is a measure of the rate of divergence of the model size and so quantifies
a higher level of computational complexity. In this case, the model basis must be
augmented in an attempt to find a finite description at some higher level. The fol-
lowing sections will demonstrate how this can happen. A more complete discussion
of reconstructing various hierarchies of models is found elsewhere.29
Pg) Cf. "set complexity" version of the regular language complexity35 and "diversity" of undirected,
unlabeled trees.4
Computation at the Onset of Chaos 237
PERIOD-DOUBLING CASCADES
To give this general framework substance and to indicate the importance of quan-
tifying computation in physical processes, the following sections address a concrete
problem: the complexity of cascade transitions to chaos. The onset of chaos often
occurs as a transition from an ordered (solid) phase of periodic behavior to a dis-
ordered (gas) phase of chaotic behavior. A cascade transition to chaos consists of a
convergent sequence of individual "bifurcations," either pitchfork (period-doubling)
in the periodic regimes or band-merging in the chaotic regimes.P1
The canonical model class of these transitions is parametrized two-lap maps of
the unit interval, xn+i = f(xn), xn E [0,1], with negative Schwartzian derivative,
that is, those maps with two monotone pieces and admitting only a single attractor.
We assign to the domain of each piece the letters of the binary alphabet E {0, 1).
The sequence space E* consists of all 0-1 sequences. Some of these maps, such as
the piecewise-linear tent map described in a later section, need not have the period-
doubling portion of the cascade. Iterated maps are canonical models of cascade
transitions in the sense that the same bifurcation sequence occurring in a set of
nonlinear ordinary differential equations (say) is topologically equivalent to that
found in some parametrized map.12,32,37
Although e-machines were developed in the context of reconstructing computa-
tional models from data series, the underlying theory provides an analytic approach
to calculating entropies and complexities for a number of dynamical systems. This
allows us to derive in the following explicit bounds on the complexity and entropy
for cascade routes to chaos. We focus on the periodic behavior near pitchfork bifur-
cations and chaotic behavior at band mergings with arbitrary basic periodicity.14,15
In distinction to the description of universality of the period-doubling route to
chaos in terms of parameter variation,31 we have found a phase transition in com-
plexity that is not explicitly dependent on control parameters.25 The relationship
between the entropy and complexity of cascades can be said to be super-universal
in this sense. This is similar to the topological equivalence of unimodal maps of
the interva1,13,36,51,52,55 except that it accounts for statistical and computational
structures associated with the behavior classes.
In this and the next sections we derive the total entropy and complexity as a
function of cylinder length n for the set of e-machines describing the behavior at
the different parameter values for the period-doubling and band-merging cascades.
The sections following this, then, develop several consequences, viz. the order and
the latent complexity of the cascade transition. With these statistical mechanical
results established, the discussion turns to a detailed analysis of the higher level
computation at the transition itself.
1191 The latter are not, strictly speaking, bifurcations in which an eigenvalue of the linearized
problem crosses the unit circle. The more general sense of bifurcation is nonetheless a useful
shorthand for qualitative changes in behavior as a function of a control parameter.
238 Complexity, Entropy, and the Physics of Information
= log N(n,m)
For periodic behavior and assuming n > P, the number of n-cylinders is given by
the period N(n, m) = P. The total entropy is then Ha(n,m) = log P. Note that, in
this case, ha vanishes.
Similarly, the complexity is given in terms of the number V,. = MI of recurrent
states
Ca=(1—a)-'log E
vor
Pv
= (1 - a)-1 log V,..1-a
= log
The number V,. of vertices is also given by the period for periodic behavior and so
we find Ca = log P. Thus, for periodic behavior the relationship between the total
and specific entropies and complexity is simple
Ca = Ha
or Ca = nh„(n)
This relationship is generally true for periodic behavior and is not restricted to the
situation where dynamical systems have produced the data. Where noted in the
following we will also use Co = log to measure the total number of machine
states.
240 Complexity, Entropy, and the Physics of Information
CHAOTIC CASCADES
In the chaotic regime the situation is much more interesting. The &machines at
periodicity q = 1 and m-order band-mergings 2' —+ 21' 1, m = 0, 1,2,3, are shown
in Figures 6-9.
FIGURE 6 Topological (-
digraph for single band chaotic
attractor.
FIGURE 7 Topological !-
digraph for 2 --► 1 band chaotic
attractor.
FIGURE 8 Topological !-
digraph for 4 --* 2 band chaotic
attractor.
Computation at the Onset of Chaos 241
FIGURE 9 Topobgical [-
digraph for 8 —4 4 band chaotic
attractor.
The graph complexity is still given by the number V,. of recurrent states as
above. The main analytic task comes in estimating the total entropy. In contrast
to the periodic regime the number of distinct subsequences grows with n-cylinder
length for all n. Asymptotically, the growth rate of this count is given by the specific
topological entropy. In order to estimate the total topological entropy at finite n,
however, more careful counting is required than in the periodic case. This section
develops an exact counting technique for all cylinder lengths that applies at chaotic
parameter values where the orbit fa(e) of the critical point x*, where f(e) = 0,
is asymptotically periodic. These orbits are unstable and embedded in the chaotic
attractor. The set of such values is countable. At these (Misiurewicz) parameters
there is an absolutely continuous invariant measure s4
There is an additional problem with the arguments used in the periodic case.
The uniform distribution of cylinders no longer holds. The main consequence is
that we cannot simply translate counting N(n, m) directly into an estimate of
licroo(n,m). One measure of the degree to which this is the case is given by the
difference in the topological entropy h and the metric entropy /2.0.17
Approximations for the total Renyi entropy can be developed using the exact
cylinder-counting methods outlined below and the machine state and transition
probabilities from {TS:)). The central idea for this is that the states represent a
Markov partition of the symbol sequence space E*. There are invariant subsets of
E*, each of which converges at its own rate to "equilibrium." Each subset obeys
the Shannon-McMillan theorem? individually. At each cylinder length each subset
is associated with a machine state. And so the growth in the total entropy in each
subset is governed by the machine's probabilistic properties. Since the cylinder-
counting technique captures a sufficient amount of the structure, however, we will
not develop the total Renyi entropy approximations here and instead focus on the
total topological entropy.
We now turn to an explicit estimate of N(n, m) for various cases. Although
the techniques apply to all Misiurewicz parameters, we shall work through the
242 Complexity, Entropy, and the Physics of Information
LH 1.1V-J
N(n,l) =1+ E 2' + > 2i
i=0 i=0
where 1k] is the largest non-negative integer less than k. The second term on the
right counts the number of tree nodes that branch at even numbered levels, the third
term is the number that branch at odd levels, and the first term counts the transient
spine that adds a single cylinder. For n > 2 and even, this can be developed into a
renormalized expression that yields a closed form as follows
FIGURE 11 Subtree of
nodes associated with
asymptotic vertices in (-
digraph for two bands
merging to one.
-2
N(n,l) =1+ 2 E
=0
i=o
2i
= 1 + 2(N(n, l) - 21)
or N(n,1)= 2 (211 - 2-1)
For n > 2 and odd, we find N(n,1) = 3 . 2(n-1)/2 - 1. This gives an upper bound
on the growth envelope as a function of n. The former, a lower bound.
The analogous expression for the 4 --t• 2 band cylinder count can be explicitly
developed. Figure 12 shows the transient spine on the tree that determines the
counting structure. In this case, the sum is
In7 SJ
N(n,2) = 2 + 211Vi E y+ y+ E y+ E .
i=o e=o i=o i=o
There are seven terms on the right-hand side. In order they account for
1. The two transient cycles, begun on 0 and 1, each of which contributes 1 node
per level;
2. Cycles on the attractor that are fed into the attractor via non-periodic tran-
sients (second and third terms); and
3. Sum over tree nodes that branch by a factor of 2 at level k + 4i, k = 3,4,5,6,
respectively.
244 Complexity, Entropy, and the Physics of Information
FIGURE 12 Transient
spine for 4 2 band
attractor. The asymptotic
subtrees are labeled with
the associated I-digraph
vertex. (Compare Figure 8.)
The sum greatly simplifies upon resealing the indices to obtain a self-similar form.
For n > P = 4 and n= 4i, we find
N(n,2)=2(1+21V+E2i +E2i
i=o i=o
.:;4
=2+4 (1+E2i
i=o
For completeness we note that this approach also works for the single band
(m = 0) case
n-1
N(n,0)= 1+ E
i=o
n-1
= 1 + ± 2 E - 2n — 1)
i=0
= 2N (n, 0) -
or N(n,0)= 2"
The preceding calculations were restricted by the choice of a particular phase
of the asymptotic cycle at which to count the cylinders. With a little more effort
a general expression for all phases is found. Noting the similarity of the l-digraph
structures between different order band-mergings and generalizing the preceding
recursive technique yields an expression for arbitrary order band-merging. This
takes into account the fact that the generation of new n-cylinders via branching
occurs at different phases on the various limbs of the transient spine. The number
of n-cylinders from the exact enumeration for the q = 1, 2m 2m-1 band-merging
is
m=0
N(n, m) = { 2m (bnon2n2— - 2-4) m # 0
where n > P = 2m and bnon = (1 + /)2" and ñ = 2-m(n mod 2m) account for
the effect of relative branching phases in the spine. This coefficient is bounded
bmin = inf bn m = 1
{n,m$0}
b.,. = sup bn,m = 3 • 2-1 41.0606602
{n,m#0}
The second bound follows from noting that the maximum occurs when, for example,
n = 2m + 2m-1. Note that the maximum and minimum values of the prefactor
are independent of the phase and of n and m. We will ignore the detailed phase
dependence and simply write b instead of bn,,n and consider the lower bound case
of b = 1.
Recalling that Co = log = m, we have
where we have set b = 1. The first term recovers the linear interdependence that
derives from the asymptotic periodicity; cf. the period-doubling case. The second
term is due to the additional feature of chaotic behavior that, in the band-merging
case, is reflected in the branching and transients in the 1-digraph structure. In
terms of the modeling decomposition introduced at the beginning, the first term
corresponds to the periodic process Pt and the branching portion of the second
term, to components isomorphic to the Bernoulli process Bt.
From the development of the argument, we see that the factor 2' in the
exponent controls the branching rate in the asymptotic cycle and so should be
related to the rate of increase of the number of cylinders. The topological entropy
is the growth rate of Ho and so can now be determined directly
Rewriting the general expression for the lower bound in a chaotic cascade makes it
clear how ho controls the total entropy
where h = f/V,. is the branching ratio of the number of vertices f that branch to
the total number V,. of recurrent states.
The above derivation used periodicity q = 1. For general periodicity band-
merging, we have V, = q • 2'n and f = 1. It is clear that the expression works
for a much wider range of e-machines with isolated branching within a cycle that
do not derive from cascade systems. Indeed, the results concern the relationship
between eigenvalues and asymptotic state probabilities in the family of labeled
Markov chains with isolated branching among cyclic recurrent states.
As a subset of all Misiurewicz parameter values, band-merging behavior has the
simplest computational structure. In closing this section, we should point out that
there are other cascade-related families of Misiurewicz parameters whose machines
are substantially more complicated in the sense that the stochastic element is more
than an isolated branching. Each family is described by starting with a general
labeled Markov chain as the lowest-order machine. The other family members are
obtained by applications of a period-doubling operator.12 Each is a product of
a periodic process and the basic stochastic machine. As a result of this simple
decomposition, the complexity-entropy analysis can be carried out. This will be
reported elsewhere. It explains many of the complexity-entropy properties above the
lower bound case of band-merging. The numerical experiments later give examples
of all these types of behavior.
Computation at the Onset of Chaos 247
where y= Z
..n2—
ci is the solution of
1
y log. y—y+ — = 0,
2
AC = C" — .
Along the periodic branch the entropy and complexity are equal, and so from the
previous development we see that
or AC = log2 (by — 1 ) .
estimates 71-1H(n) go over to the entropy growth rates ha. As a result, all of the pe-
riodic behavior lies on the ha = 0 line in the (hc„ Ca)-plane. This limiting behavior
is consistent with a zero-temperature phase transition of a one-spatial-dimension
spin system with finite interaction range.
This analysis of the cascade phase transition should be contrasted with the
conventional descriptions based on correlation function and mutual information
decay. The correlation length of a statistical mechanical system is defined most
generally as the minimum size L at which there is no qualitative statistical difference
between the system of size L and the infinite (thermodynamic limit) system. This
is equivalent in the present context to defining a correlation length La at which L-
cylinder a-order statistics are close to asymptotic.1201 If we consider the total entropy
Ha(L) as the (dis)order parameter of interest, then for finite e-machines,1211 away
from the transition on the chaotic side, we expect its convergence to asymptotic
statistics to behave like
2H-(L) oc 2r .
But for L sufficiently large
oc 2h.z.
where ha = log2 Aa. By this argument, the correlation length is simply related
to the inverse of the specific entropy: La oc h;;1. We would conclude, then, that
the correlation function description of the phase transition is equivalent in many
respects to that based on specific entropy.
Unfortunately, this argument, which is often used in statistical mechanics, con-
fuses the rate of decay of correlation with the correlation length. These quantities
are proportional only assuming exponential decay or, in the present case, assuming
finite &machines. The argument does indicate that as the transition is approached
the correlation length diverges since the specific entropy vanishes. For all behav-
ior with zero metric entropy, periodic or exactly at the transition, the correlation
length is infinite. As typically defined, it is of little use in distinguishing the various
types of zero-entropy behavior.
The correlation length in statistical mechanics is determined by the decay of
the two-point autocorrelation function
1201Cf.
the entropy "convergence knee" na. 19
PilThe statistical mechanical argument, from which the following is taken, equivalently assumes
exponential decay of the correlation function.
250 Complexity, Entropy, and the Physics of Information
where si is the ith symbol in the sequence s and Ha(•) is the Renyi entropy J221
Using this to describe phase transitions is an improvement over the correlation
function in that, for periodic data, it depends on the period P : Ic, cc log P. In
contrast, the correlation function in this case does not decay and gives an infinite
correlation length.
The convergence of cylinder statistics to their asymptotic (thermodynamic
limit) values is most directly studied via the total excess elltrOpY18'25'35'58
Fa(L)=Ha(L)—ha L.
hierarchy. With this we obtain a much finer classification than is typical in phase
transition theory.
The structure of the limiting machine can be inferred from the sequence of
machines reconstructed at 2" 2m+1 period-doubling bifurcation on the periodic
side and from those reconstructed at 2"' -+ 2m-1 band-merging on the chaotic side.
(Compare Figures 2 and 6, 3 and 7, 4 and 8, 5 and 9.) All graphs have transient
states of pair-wise similar structure, except that the chaotic machines have a period
2m-1 unstable cycle. All graphs have recurrent states of period 2'. In the periodic
machines this cycle is deterministic. In the chaotic machines, although the states
are visited deterministically, the edges have a single nondeterministic branching.
The order of the phase transition depends on the structural differences between
the (-machines above and below the transition to chaos. In general, if this structural
difference alters the complexity at constant entropy, then the transition will be
second order. At the transition to chaos via period doubling there is a difference in
the complexities due to
1. The single vertex in the asymptotic cycle that branches; and
2. The transient 2m-1 cycle in the machines on the chaotic side.
At constant complexity the uncertainty developed by the chaotic branching and
the nature of the transient spine determine the amount of dynamic information
production required to make the change from predictable to chaotic (-machines.
The following two subsections summarize results discussed in detail elsewhere.
CRITICAL MACHINE
The machine M that accepts the sequences produced at the transition, although
minimal, has an infinite number of states. The growth of machine size IV(L)I versus
reconstruction cylinder size L at the transition is demonstrated in Figure 14. The
maximum growth is linear with slope co = 3. Consequently, the complexity diverges
logarithrnically.P41 The growth curve itself is composed of pieces with alternating
slope 2 and slope 4
IV(L)I =
4L 3 - 21-1 < L <2' 1-1
The slope 2 learning regions correspond to inferring more of the states that link
the upper and lower branches of the machine. (The basic structure will be made
clearer in the discussion of Figure 15 below.) The slope 4 regions are associated
with picking up pairs of states along the long deterministic chains that are the
upper and lower branches. Recalling the definition of c0, in a previous section, we
note that finite co indicates a constant level of complexity using a more powerful
computational model than IPt,130.-
(25)The general framework for reconstructing machines at different levels in a computational hi-
erarchy is presented elsewhere.29
Computation at the Onset of Chaos 253
transition from states signified by squares, the register production is performed first
and then the transition is made. The machine begins in the start state with a "1"
in the register.
Me accepts the full language Le produced at the transition including the tran-
sient strings with various prefixes. At its core, though, is the simple recursive pro-
duction (B BB') for the itinerary to, of the critical point z". We will now explore
the structure of this sequence in more detail in order to see just what computational
capabilities it requires. We shall demonstrate how and where it fits in the Chomsky
hierarchy.
Before detailing the formal language properties of the symbol sequences generated
at the cascade transition, several definitions of restricted languages are in order.
First, of course, is the critical language itself Lc which we take to be the set of
all subsequences produced asymptotically by the dynamical system at the cascade
transition. Me is a deterministic acceptor of Lc. Second, the most restricted lan-
guage, denoted L1 , is the sequence of the itinerary of the map's maximum e. That
is,
Note that L, is the further generalization including subsequences that start at any
symbol in we
With these various languages, we can begin to delineate the formal properties of
the transition behavior. First, we note that an infinite number of words occur in Lc
even though the metric entropy is zero. Additionally, there are an infinite number
of inadmissible sequences and so an infinite number of words in the complement
language Lc, i.e., words not in Lc. One consequence is that the transition is not
described by a subshift of finite type since there is no finite list of words whose
concatenation generates Lc .3
Computation at the Onset of Chaos 255
Second, in formal language theory, "pumping lemmas" are used to prove that
certain languages are not in some language class3s Typically this is tantamount to
demonstrating that particular recurrence or cyclic properties of the class are not
obeyed by sufficiently long words in the language in question. Regular languages
(RL) are those accepted by DFAs. Using the pumping lemma for regular languages,
it is easy to demonstrate that L E L2, L3, Lc } is not regular. This follows from
noting that there is no length n such that each word z E L with Uzi > n can be
broken into three subwords, z = uvw with luv I < n, where the middle (nonempty)
subword can be repeated arbitrarily many times. That is, sufficiently long strings
cannot be decomposed such that z E L = uew E L Vi > 0. In fact, no substrings
can be arbitrarily pumped. The lack of such a cyclic property also follows from
noting that in M all the states are transient and there are no transient cycles.
The observation of this structural property also leads to the conclusion that Lc is
also not finitely described at the next level of the complexity hierarchy: context-
free languages (CFL), i.e., those accepted by push-down automata. This can be
established directly using the pumping lemma for context-free languages.
Third, in the structural analysis of M, we found states at which the following
production is applied: A —+ AA, where A' = so • —4 if A = so • • • sk and is
the complement of s. This production generates L1 and L2. It is most concisely
expressed as a context-free Lindenmayer system.49 The general class is called OL
grammars: G = {E, P, a} consisting of the symbol alphabet, production rules, and
start string, respectively. This computational model is a class of parallel rewrite
automata in which all symbols in a word have the production rules simultaneously
applied, with the neighboring symbols playing no role in the selection of which
production. The symbol alphabet is E = {O, 1}. The production rules P are quite
simple P = {0 --+ 11,1 10} and start with the string a = {1}. This system
generates the infinite sequence L1 and allowing the choice of when to stop the
productions, it generates L2 = {l,10,1011,10111010, ...}.
Although the L-system model of the transition behavior is quite simple, as a
class of models its parallel nature is somewhat inappropriate. L-systems produce
both "early" and "late" symbols in a string at every production step, whereas the
dynamical system in question produces symbols sequentially. This point is even
more obvious when these symbol sequences are considered as sequential measure-
ments. The associated L-system model would imply that the generating process had
an infinite memory of past measurements and accessed them arbitrarily quickly. The
model class is too powerful.
This can be remedied by converting the OL-system to its equivalent in the
Chomsky hierarchy of sequential computation as The Chomsky equivalent is a re-
stricted indexed context-free grammar Ge = {N, I, T, F, P, S}. 1 A central feature
of the indexed grammars is that they are a natural extension of the context-free
languages that allow for a limited type of context sensitivity via indexed produc-
tions, while maintaining properties of context-free languages, such as closure and
decidability, that are important for compilation. For the limit language the com-
ponents are defined as follows. N = {S, T} is the set of nonterminal variables with
256 Complexity, Entropy, and the Physics of Information
S Tg -4 BgAg (1)
- FAg lAg --0 1E 10
S -+Tfg BfgAfg (2)
- DgAfg BgAgAfg FAgAfg
- lAgAfg lEAfg 10Af g
-4 10Cg 10BgBg 10FBg
- 101F 1011
Productions are applied to the left-most nonterminal in each step. Consequently, the
terminal symbols {0,1} are produced sequentially left to right in "temporal" order.
In the first line, notice how the indices distribute over the variables produced by the
production T -4 BA. When an indexed production is used an index is consumed:
as in Bg F in going from the first to the second line above.
All of the languages in the Chomsky hierarchy have dual representations as
grammars and as automata. The machine corresponding to an indexed context-free
language is the nested stack automaton (NSA).2 This is a generalization of the push-
down automaton: a finite-state control augmented with a last-in first-out memory
or stack. An NSA has the additional ability to move into the stack in a read-only
mode and to insert a new (nested) stack at the current stack symbol being read.
It cannot move higher in the stack until it has finished with the nested stack and
removed it. The restricted indexed context-free grammar for L2 is recognized by
the one-way nondeterministic NSA (1NNSA) shown in Figure 17. The start state
is q. The various actions label the state transition edges. $ denotes the top of the
current stack and the cent sign, the current stack bottom. The actions are one of
three forms:
1. a 13, where a and # are patterns of symbols on the top of the current stack;
2. a {1,-1}, where the latter indicates moving the head up and down the stack,
respectively, upon seeing the pattern a at current stack top.
Computation at the Onset of Chaos 257
3. (t,$t) (1,$), where t is a symbol read off of the input tape and compared
to the symbol at the top of the stack. The "1" indicates that the input head
advances to the next symbol on the input tape. The symbol on the stack's top
is removed: $t $.
In all but one case the actions are in the form of a symbol pattern on the top of
the stack leading to a replacement pattern and a stack head motion. The notation
on the figure uses a component-wise shorthand. For example, the productions are
implemented on the transition labeled ${S,T,T,C,D,E,F} ${Tg,TI,BA,BB,BA,0,1}
which is shorthand for the individual transitions: $S $Tg, $1 --0 $Tf, $T $BA,
$C $BA, $0 —+ $BB, $E $0, and $F $1. The operation of the 1NNSA mimics
the derivations in the indexed grammar. The nondeterminism here means that there
exists some set of transitions that will accept words from L2. L3 is accepted by the
same 1NNSA, but modified to accept when the end of the input string is reached
and the previous input has been accepted.
There are three conclusions to draw from these formal language results. First,
it should be emphasized that the particular details in the preceding analysis are not
essential. Rather, the most important remark is that the description at this higher
level is finite and, indeed, quite small. Despite the infinite DFA complexity, a simple
higher-level description can be found once the computational model is augmented.
Indeed, the deterministic Turing machine program to generate words in the limit
language is simple: (i) copy the current string on the tape onto its end and (ii) invert
the last bit. The limit language for the cascade transition uses little of the power
of the indexed grammars. The latter can recognize, for example, context-sensitive
languages. The limit machine is thus exceedingly weak in its implied computational
structure. Also, the only nondeterminism in the 1NNSA comes from anticipating the
length of the string to accept; a feature that can be replaced to give a deterministic
and so less powerful automaton.
Second, it is relatively straightforward to build a continuous-state dynamical
system with an embedded universal Turing machine.r261 With this in mind, and for
its own sake, we note that by the above construction the cascade transition does not
have universal computation embedded in it. Indeed, it barely aspires to be much
more than a context-free grammar. With the formal language analysis we have
bounded the complexity at the transition to be greater than regular and context-
free languages and no more powerful than indexed context-free. Furthermore, the
complexity at this level is measured by a linearly bounded DFA growth rate co = 3.
These properties leave open the possibility, though, that the language could be a
one-way nondeterministic stack automaton (1NSA).38
Finally, we demonstrated by an explicit analysis that nontrivial computation,
beyond information storage and transmission, arises at a phase transition. One
is forced to go beyond DFA models to the higher stack automaton level since the
(261A two-dimensional map with an embedded 4-symbol, 7-state, universal Turing machines3 was
constructed 23
258 Complexity, Entropy, and the Physics of Information
former require an infinite representation. These properties are only hinted at by the
infinite correlation length and the slow decay of two-point mutual information at
the transition.
LOGISTIC MAP
The preceding analysis holds for a wide range of nonlinear systems since it rests only
on the symbolic dynamics and the associated probability structure. It is worthwhile,
nonetheless, to test it quantitatively on particular examples. This is possible because
it rests on a (re)constructive method that applies to any data stream. This section
and the next report extensive numerical experiments on two one-dimensional maps
The first is the logistic map, defined shortly, and the second, the piecewise linear
tent map.
The logistic map is a map of the unit interval given by
where the parameter r controls the degree of nonlinearity. r/4 is the map's height
at its maximum e=
1/2. This is one of the simplest, but nontrivial, nonlinear
dynamical systems. It is an extremely rich system about which much is known.12
It is fair to say, however, that even at the present time there are still a number of
unsolved mathematical problems concerning the behavior at arbitrary chaotic pa-
rameter values. The (generating) measurement partition is P1,2 = [0, .5), [.5,1.]}.
The machine complexity and information theoretic properties of this system
have been reported previously.25 Figure 18 shows the complexity versus specific
Computation at the Onset of Chaos 259
Cl
FIGURE 18 Observed complexity
versus specific entropy estimate for
the logistic map at 193 parameter
values r E [3,4] within both periodic
and chaotic regimes. Estimates on
32-cylinder trees with 16-cylinder
subtree machine reconstruction,
H(16)/16 1 where feasible.
entropy for 193 parameter values r E [3,4]. One of the more interesting general
features of the complexity-entropy plot is clearly demonstrated by this figure: all
of the periodic behavior lies below the critical entropy I-1c; and all of the chaotic,
above. This is true even if the periodic behavior comes from cascade windows of
periodicity q > 1 within the chaotic regime at high parameter values. The (14, Ca)
plot, therefore, captures the essential information processing, i.e., computation and
information production, in the period-doubling cascade independent of any explicit
system control.
The lower bound derived in the previous sections applies exactly to the periodic
data (H < HO and to the band-merging parameter values. The fit to the periodic
data is extremely accurate, verifying the linear relationship except for high periods
beyond that resolvable at the chosen reconstruction cylinder length. The fit in the
chaotic regime is also quite good. (See Figure 19.) The data are systematically
lower (-2%) in entropy due to the use of the topological entropy in the analysis.
The measured critical entropy 14 and complexity C" at the transition were 0.28
and 4.6, respectively.
TENT MAP
The second numerical example, the tent map, is in some ways substantially simpler
than the logistic map. It is given by
where the parameter a controls the height (= a/2) of the map at the maximum
x* = 1/2. The main simplicity is that there is no period-doubling cascade and,
for that matter, there are no stable periodic orbits, except at the origin for a < 1.
There is instead only a periodicity q = 1 chaotic band-merging cascade that springs
from x* at a = 1.
The piecewise linearity also lends itself to further analysis of the dynamics. Since
the map has the same slope everywhere, the Lyapunov exponent A, topological,
metric, and Renyi specific entropies are all equal and given by the slope A = ha =
log2 a. We can simply refer to these as the specific entropy. From this, we deduce
that, since ha = 2-"' for 2in -+ 2m-t band-mergings, the parameter values there
are
= 22-m .
For L > 2"1 the complexity is given by the band-merging period. And this, in turn,
is given by the number of bands. Thus, we have Ca = — log2 ha or
Ca = — log2 log2 a
as a lower bound for a > 1 and L > T" at an m-order band merging.
Since there is no stable periodic behavior, other than period one, there is a
forbidden region in the complexity-entropy plot below the critical entropy. The
system cannot exist at finite "temperatures" below except at absolute zero
=0.
Figure 20 gives the complexity-entropy plot for 200 parameter values a E [1, 2].
There is a good deal of structure in this plot beyond the simple band-merging lower
bounds that we have concentrated on. Near each band-merging complexity-entropy
point, there is a slanted cluster of points. These are associated with families of
Computation at the Onset of Chaos 261
parameter values at which the iterates r (e) are asymptotically periodic of various
periods. We shall discuss this structure elsewhere, except to note here that it also
appears in the logistic map, but is substantially clearer in this example.
Figure 21 shows band-merging data estimated from 16 and 20 cylinders along
with the appropriate theoretical curves for those and in the thermodynamic limit
(L = 256).
6 0
0
Cl Li
Prom the two numerical examples, it is clear that the theory quite accurately
predicts the complexity-entropy dependence. It can be easily extended in several
directions. Most notably, the families of Misiurewicz parameters associated with
unstable asymptotically periodic maxima can be completely analyzed. And this
appears to give some insight into the general problem of the measure of parameter
values where iterates of the maximum are asymptotically aperiodic. Additionally,
the computational analysis is being applied to transitions to chaos via intermittency
and via frequency-locking.
ACKNOWLEDGMENTS
The authors have benefited from discussions with Charles Bennett, Eric Friedman,
Chris Langton, Steve Omohundro, Norman Packard, Jim Propp, Jorma Rissanen,
and Terry Speed. They are grateful to Professor Carson Jeffries for his continuing
support. The authors thank the organizers of the Santa Fe Institute Workshop on
"Complexity, Entropy, and Physics of Information" (May 1989) for the opportunity
to present this work, which was supported by ONR contract N00014-86-K-0154.
266 Complexity, Entropy, and the Physics of Information
REFERENCES
1. Aho, A. V. "Indexed Grammars - An Extension of Context-Free Grammars."
J. Assoc. Comp. Mach. 15 (1968):647.
2. Aho, A. V. "Nested Stack Automata." J. Assoc. Comp. Mach. 16 (1969):383.
3. Alekseyev, V. M., and M. V. Jacobson. "Symbolic Dynamics." Phys. Rep. 25
(1981):287.
4. Bachas, C. P., and B. A. Huberman. "Complexity and Relaxation of Hierar-
chical Structures." Phys. Rev. Lett. 57 (1986):1965.
5. Bennett, C. H. "Thermodynamics of Computation - A Review." Intl. J.
Theor. Phys. 21 (1982):905.
6. Bennett, C. H. "On the Nature and Origin of Complexity in Discrete, Homo-
geneous Locally-Interacting Systems." Found. Phys. 16 (1986):585.
7. Blahut, R. E. Principles and Practice of Information Theory. Reading, MA:
Addison-Wesley, 1987.
8. Brudno, A. A. "Entropy and The Complexity of the Tiajectories of a Dynam-
ical System." Trans. Moscow Math. Soc. 44 (1983):127.
9. Chaitin, G. "On the Length of Programs for Computing Finite Binary Se-
quences." J. ACM 13 (1966):145.
10. Chaitin, G. "Randomness and Mathematical Proof." Sci. Am. May (1975):
47.
11. Chomsky, N. "Three Models for the Description of Language." IRE Trans.
Info. Theory 2 (1956):113.
12. Collet, P., and J.-P. Eckmann. Maps of the Unit Interval as Dynamical Sys-
tems. Berlin: Birkhauser, 1980.
13. Collet, P., J. P. Crutchfield, and J.-P. Eckmann. "Computing the Topological
Entropy of Maps." Comm. Math. Phys. 88 (1983):257.
14. Crutchfield, J. P., and B. A. Huberman. "Fluctuations and the Onset of
Chaos." Phys. Lett. 77A (1980):407.
15. Crutchfield, J. P., J. D. Farmer, N. H. Packard, R. S. Shaw, G. Jones, and R.
Donnelly. "Power Spectral Analysis of a Dynamical System." Phys. Lett. 76A
(1980):1.
16. Crutchfield, J. P., J. D. Farmer, and B. A. Huberman. "Fluctuations and
Simple Chaotic Dynamics." Phys. Rep. 92 (1982):45.
17. Crutchfield, J. P., and N. H. Packard. "Symbolic Dynamics of One-Di-
mensional Maps: Entropies, Finite Precision, and Noise." Intl. J. Theor.
Phys. 21 (1982):433.
18. Crutchfield, J. P., and N. H. Packard. "Noise Scaling of Symbolic Dynam-
ics Entropies." Evolution of Order and Chaos, edited by H. Haken. Berlin:
Springer-Verlag, 1982, 215.
19. Crutchfield, J. P., and N. H. Packard. "Symbolic Dynamics of Noisy Chaos."
Physica 7D (1983):201.
20. Crutchfield, J. P. Noisy Chaos. University of California, Santa Cruz, pub-
lished by University Microfilms Intl., Minnesota, 1983.
Computation at the Onset of Chaos 267
Results of Feynman and others have shown that the quantum formalism
permits a closed, microscopic, and locally interacting system to perform
deterministic serial computation. In this paper we show that this formal-
ism can also describe deterministic parallel computation. Achieving full
parallelism in more than one dimension remains an open problem.
INTRODUCTION
In order to address questions about quantum limits on computation, and the possi-
bility of interpreting microscopic physical processes in informational terms, it would
be useful to have a model which acts as a bridge between microscopic physics and
computer science.
Feynman and others2,6,1° have provided models in which closed, locally inter-
acting microscopic systems described in terms of the quantum formalism perform
deterministic computations. Up until now, however, all such models implemented
deterministic serial computation, i.e., only one part of the deterministic system is
active at a time.
We have the prejudice that things happen everywhere in the world at once,
and not sequentially like the raster scan which sweeps out a television picture.
It would be surprising, and perhaps a serious blow to attempts to ascribe some
deep significance to information in physics, if it were impossible to describe parallel
computations within the quantum formalism.
In this paper, we extend the discussion of a previous paperl° to obtain for
the first time a satisfactory model of parallel "quantum" computation, but only
in one dimension. The two-dimensional system discussed in the previous paperl°
is also shown to be a satisfactory model, but the technique used here only allows
one dimension to operate in parallel: the more general problem of the possibility
of fully parallel two- or three-dimensional quantum computation remains open.
COMPUTATION
The word computation is used in many contexts. Adding up a list of numbers is a
kind of computation, but this task requires only an adding machine, not a general
purpose computer. Similarly, we can compute the characteristics of airflow past an
aircraft's wing by using a wind tunnel, but such a machine is no good for adding
up a list of numbers.
An adding machine and a wind tunnel are both examples of computing ma-
chines: machines whose real purpose is not to move paper or air, but to manipulate
information in a controlled manner. It is the rules that transform the information
that are important: whether the adding machine uses enormous gears and springs,
or microscopic electronic circuits, as long as it follows the addition algorithm cor-
rectly, it is acting as an adding machine.
A universal computer is the king of computing machines: it can simulate the
information transformation rules of any physical mechanism for which these rules
are known. In particular, it can simulate the operation of any other universal
computer—thus all universal computers are equivalent in their simulation capabil-
ities. It is an unproven, but thus far uncontradicted contention of computer theory
that no mechanism is any more universal than a universal digital computer, i.e.,
one that manipulates information in a discrete form.
Assuming a finite universe, no machine can have a truly unbounded memory;
what we mean when we talk about a general purpose computer is a machine that,
if it could be given an unbounded amount of memory, would be a universal com-
puter. (In common usage, the terms general purpose computer and computer are
synonymous.) Similarly, when we talk about a finite set of logic elements as being
universal, we mean that an unbounded collection of such elements could constitute
a universal computer.
Parallel Quantum Computation 275
QUANTUM COMPUTATION
Although all general purpose computers can perform the same computations, some
of them work faster, use less energy, weigh less, are quieter, etc., than others. In
general, some make better use of the computational opportunities and resources
offered by the laws of physics than do others. For example, since signals travel so
slowly (it takes about a nanosecond to go a foot, at the speed of light), there is a
tremendous speed advantage in building computers which have short signal paths.
Modern microprocessors have features that are only a few hundred atoms across:
such small components can be crowded close together, allowing the processor to be
small, light, and fast.
As we try to map our computations more and more efficiently onto the laws
and resources offered by nature, we are eventually confronted with the question
of whether or not we can arrange for extremely microscopic physical systems to
perform computations. What we ask here is in a sense the opposite of the hidden
variables question: we ask not whether a classical system can simulate a quantum
system in a microscopic and local manner, but rather, whether a quantum system
can simulate a classical system in such a manner.
All of our discussion of quantum computation will be based on autonomous
systems: we prepare the initial state, let the system undergo a Schr6dinger evolution
as an isolated system, and after some amount of time we examine the result.41
Since the Schr6dinger evolution is unitary, and hence invertible, we must base our
computations on reversible logic.'
111For some types of computations, we can't set a very good limit on how long we should let it
run before looking. In such cases, we would simply start a new computation if we look and find
that we aren't finished.
276 Norman Margolus
REVERSIBLE COMPUTATION
Until recently, it was thought that computation is necessarily irreversible: it was
hard, for instance, to imagine a useful computer in which one could not simply erase
the contents of a register. It was to most people a rather surprising result3,7,8,9,13
that computers can be constructed completely out of invertible logic elements,
and that such machines can be about as easy to use as conventional computers.
This result has thermodynamic consequences, since it turns out that a reversible
computer is the most energy efficient engine for transforming information from
one form to another. This result also means that computation is not necessarily a
(statistically irreversible) macroscopic process.
As an example of an invertible logic element, consider the Fredkin gate of
Figure 1. This gate is in fact its own inverse (two connected in series give the identity
function), and this gate is a universal logic element: you can construct any invert-
ible logic function out of Fredkin gates. A logic circuit made out of Fredkin gates
looks much like any conventional logic circuit, except that special "mirror-image
circuit" techniques are used to avoid the accumulation of undesired intermediate
results that we aren't allowed to simply erase (see Fredkin and Toffoli7 for more
details).
Feynman made a quantum system simulate a collection of invertible logic gates
connected together in a combinational circuit (i.e., one without any feedback).12]
In Feynman's construction, only one logic element was active (i.e., transforming its
inputs into outputs) at any given time: the different gates were activated one at a
time as they were needed to act on the output of gates that were active earlier. We
can imagine a sort of "fuse" running through our circuit: as the active part of the
fuse passes each circuit element in turn, it activates that element. Using a collection
of two-state systems (which he called atoms) to represent bits, Feynman made a
"quantum" version of this model. In what follows, we will think of our two-state
systems as spin-1/2 particles.
MAlthough combinational circuitry can perform any desired logical function, computers are usu-
ally constructed to run in a cycle, reusing the same circuitry over and over again. The parallel
models discussed later in this paper run in a cycle.
Parallel Quantum Computation 277
where a and b are lowering operators at the two spins, and at and bt are their
Hermitian adjoints, which are raising operators on the two spins.
Without any claim yet to a connection with quantum mechanics, we can cast
the overall logical function implemented by an N-gate invertible combinational logic
function into the language of linear operators acting on a tensor product space as
follows:
F= E FkCkCI +1 (1)
k=1
where ck is the lowering operator on the clock spin that passes next to the kth gate
Fk. If we start all of the clock spins off in the down state except for the spin next
to the first gate, then if F acts on this system, only the term
F1c14
will be nonvanishing. This term will cause the spins acted upon by the first gate to
be updated, the first clock spin will be turned down, and the second clock spin will
go up. Similarly, if F acts again, the second gate will update, and the up spin will
move to the third position. Clearly if the initial state has only a single clock spin
I31Less physical models were proposed earlier by Benioir,1 who seems to have been the first to
raise the question of quantum computation in print.
278 Norman Margolus
up, F will preserve that property. Using the position of the up clock spin to label
the state, then if 11) is the initial state, F 11) = 12), and in general F 1k) = lk + 1).
We have thus been able to write the forward time-step operator as a sum of local
pieces by serializing the computation—only one gate in the circuit is active during
any given step.
Notice that the operator n is the inverse of Fk, since the role of raising and
lowering operators is interchanged. Similarly, Ft is the inverse of F, since each term
of the former undoes the action of the corresponding term of the latter, including
moving the clock spin back one position. Now if we add together the forward and
backward operators, we get an Hermitian operator H = F + Ft which is the sum
of local pieces, each piece acting only on a small collection of neighboring spins
(a gate). At this point we make contact with quantum mechanics, by seeing what
happens if we use this H as the Hamiltonian in a Schrodinger evolution.
If we expand the time evolution operator U(t) we get
tz
U(2) = 1 _ _ Hz (F+ Ftriz
2 -4- = 1 — i(F Ft)i 2 +
and so we get a sum of terms, each of which is proportional to F or Ft to some
power. Thus if 1k) is evolved for a time t, it becomes lk) which is a superposi-
tion of configurations of the serialized computation which are legitimate successors
and predecessors of 1k): each term in the superposition has a single clock spin at
some position, and the computation is in the corresponding state.
Feynman now noted that the operators Fk don't affect the dynamics of the ck's:
we can consider F = EkN ckci÷i for the purposes of analyzing the evolution of the
clock spins. But then H = F + Ft supports superpositions of the one-spin-up states
called spin waves, as is well known. When we add back in the Fk's, the computation
simply rides along at a uniform rate on top of the clock spin waves. This point will
be discussed in more detail below, when we extend this serial model to deal with
parallel computation.
PARALLEL COMPUTATION
Serial computers follow an algorithm step by step, completing one step before be-
ginning the next; parallel computers make it possible to do several parts of the
problem at once in order to finish a computation sooner. Although Feynman's con-
struction is based on a serial model, his idea of concentrating all of the quantum
uncertainty into the time of completion, while leaving none in the correctness of the
computation, can be extended to parallel computations.1° Maintaining correctness
is again achieved simply by construction of the Hamiltonian: states in the Hilbert
space that correspond to configurations on a given computational orbit form an
invariant subspace under the Schrodinger evolution. This property of the Hamilto-
nian does not, in general, say anything about the rate at which we can compute.
Parallel Quantum Computation 279
Here we show that Feynman's technique for making a serial model of quantum com-
putation run at a constant rate can, in fact, also be extended to apply to a parallel
system, in particular to the one-dimensional analogue of the case considered in the
previous paper.1° From this, we can derive a way of making the two-dimensional
system considered in the previous paperl° compute at a constant rate, but with
parallelism that extends over only one dimension.
For simplicity, our discussion of parallel computers will be confined to cellular
automata (CA): uniform arrays of computing elements, each connected only to its
neighbors. These systems can be universal in the strong sense that a given universal
cellular automaton (assuming it's big enough) can simulate any other computing
structure of the same dimensionality at a rate that is independent of the size of
the structure .141 By showing that, given any desired (synchronous) CA evolution,
we can write down a Hamiltonian that simulates it, we will have shown that the
QM formalism is computationally universal in this strong sense, at least for one-
dimensional rules.
Feynman's model involved only states in which a single site was active at a time.
In order to accommodate both neighbor interactions and parallelism in quantum
mechanics, we find that we are forced to consider asynchronous (no global time)
computing schemes (but still employing invertible logic elements). For suppose that
our Hamiltonian is a sum of pieces each of which only involves neighbor interactions
H = E H x,y,z (2)
x,y,z
Then consider the time evolution 1— iHt over an infinitesimal time interval. When
this operator acts on a configuration state of our system, we get a superposition of
configuration states: one term in the superposition for every term in the sum (Eq. 2)
above. If we want all of the terms in this superposition to be valid computational
states, then we must allow configurations in which one part has been updated, while
everything else has been left unchanged.
LOCAL SYNCHRONIZATION
One can perform an effectively synchronous computation using an asynchronous
mechanism by adding extra state variables to keep track of relative synchronization
(how many more times one portion of the system has been updated than an adjacent
portion). To use an analogy, consider a bucket brigade carrying a pile of stones up
a hill. You hand a stone to the first person in line, who passes it on to the next,
and so on up the hill. An asynchronous computation would correspond to every
individual watching the person ahead of himself, and passing his stone along when
the next person has gotten rid of theirs. This involves only local synchronization. A
[41This isn't the usual definition of universality in CA, but it is the one that we'll use here.
280 Norman Margoius
•• - 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 - •
t I
1 1
0 0
1 1 1
o o o 0 0
MEM
1 .1 x
o o
Notice that with this scheme, two adjacent cells cannot get more than one step
apart in update-count: since this count is only used to tell whether a given cell is
using the even step pairing or the odd step pairing, and to tell if adjacent cells are
at the same step, we only need to look at the least significant bit of the up date-
count. Thus if we take our original synchronous automaton and add a single bit of
update-count to each cell, we can run the system asynchronously while retaining a
perfectly synchronous causality.
In Figure 2 we show a possible state for the update-count bits (henceforth we'll
call them clock bits) in a one-dimensional pairing automaton of the type we've been
discussing, which is consistent with an evolution starting from a synchronous initial
state. In Figure 3 we use a spacetime diagram to integrate the relative time phases:
arbitrarily calling the time at the left hand position t = 0, we mark cells using the
relative time information encoded in the clock bits. As we move across, if a cell is
at the same time as its neighbor to the left, we mark it at the same time on this
diagram, if it is ahead, we mark it one position ahead, etc. The result is a diagram
illustrating the hills and valleys of time present in this configuration. Note that we
can tell if a given cell in Figure 2 which is at a different time phase than its neighbor
to the left is ahead or behind this neighbor by seeing whether or not it is waiting
for the neighbor to catch up in order to be paired with it.
Note that if we allow backward steps, this synchronization scheme still works
fine: we can imagine that a backward step is simply undoing a forward step, get-
ting us to a configuration we could have gotten to by starting at an earlier initial
synchronous step, and running forward.
These configurations then, with their hills and valleys of time, will be the
classical configurations which our quantum system will simulate.
282 Norman Margolus
F= E Di44+, + E =E (3)
i even i odd
This operator, acting on a state In, a), produces a superposition of states each
of which belongs to time n + 1. Similarly, Ft takes us backwards one time step.
Note, however, that Ft is not the inverse of F. Nevertheless, on the subspace of
computational configurations (those that can be obtained by a sequence of local
updates starting from a synchronous configuration) F and Ft commute: this prop-
erty, which will be proven below, will be crucial in our construction.
As before, we let H = F + Ft, and if we expand the time evolution operator
U(t) = e—sHt we get a superposition of terms, each involving products of Fi's
and /I's for various i's and j's. Since each such term, acting on a computational
configuration, gives us another computational configuration (by construction of the
clock bits), the time evolution U doesn't take us out of our computational subspace.
RUNNING IN PARALLEL
Now we would like to have our parallel computation run forward at a uniform
rate. We are imagining that our space is periodic: the chain of cells is finite and
the ends are joined. Designating one particular state of the equivalent globally
synchronous computation as t = 0, we can assign a value of t to every configuration
on each synchronous computational orbit, and from these assign a value of n to the
integrated time on every locally synchronized computational configuration. Thus
we can construct an operator N which, acting on a configuration In, a), returns n:
N In, a) = n In, a)
Parallel Quantum Computation 283
It turns out that the one-dimensional version of Feynman's serial model is a special
case of the model discussed above: if we complement the meaning of every second
clock spin (say, all the ones at even positions), Eq. 3 becomes
F = E Acid+, + E E
i even i odd
which is of exactly the same form as Eq. 1. An initial state containing a single up
clock spin and all the rest down would correspond, in our parallel system of Eq. 3,
to all of the even clock spins up, and all of the odd ones down, except for the spin
at the active position k, which is the same as its two neighbors. Since updating
in our parallel model only occurs at positions where two adjacent clock spins are
the same, there are only two active blocks in such an alternating configuration:
the block involving k and k + 1, which will be a step forward if updated, and the
block involving k and k — 1, which will be a step back if updated. If we draw a
spacetime diagram of the clock spins around position k (see Figure 4) showing the
relative synchronization implied by the alternating pattern of clock spins, we see
that it forms a staircase with a landing that moves up and down in time as its
leading edge or trailing edge is updated. Because the space is periodic, the top of
this staircase is connected to the bottom: this configuration is not on the orbit of
any synchronous parallel computation.
t I
0
1
0
1
0
1
0 0 0
1
0
1
0
1
CONCLUSIONS
The study of the fundamental physical limits of efficient computation requires us
to consider models in which the mapping between the computational and physical
degrees of freedom is as close as is possible. This has led us to ask whether the
structure of quantum mechanics is compatible with parallel deterministic compu-
tation. If the answer was no, then such computation would in general have to be a
macroscopic phenomenon. In fact, at least in one dimension, it does seem possible
286 Norman Margolus
ACKNOWLEDGMENTS
I would like to gratefully acknowledge conversations with R. P. Feynman in which
he pointed out to me the relationship between my parallel model and his serial
model, and discussions with L. M. Biafore in which it became evident that the
one-dimensional version of my parallel QM construction might be made to run at
a uniform rate.
This research was supported by the Defense Advanced Research Projects Agen-
cy and by the National Science Foundation.
Parallel Quantum Computation 287
REFERENCES
1. Benioff, P. A. "Quantum Mechanical Hamiltonian Models of Discrete Pro-
cesses that Erase Their Own Histories: Application to Turing Machines." Int.
J. Theor. Physics 21 (1982):177-202.
2. Benioff, P. A. "Quantum Mechanical Hamiltonian Models of Computers." In
the proceedings of the conference "New Ideas and Techniques on Quantum
Measurement Theory," Jan. 1986. Ann. New York Acad. Sci. 480 (1986):475-
486.
3. Bennett, C. H. "Logical Reversibility of Computation." IBM J. Res. 6 Devel.
17 (1973):525.
4. Deutsch, D. "Quantum Theory, the Church-Turing Hypothesis, and Universal
Quantum Computers." Proc. Roy. Soc. Lond. A 400 (1985):97-117.
5. Feynman, R. P. "Simulating Physics with Computers." Int. J. Theor. Phys.
21 (1982):467.
6. Feynman, R. P. "Quantum Mechanical Computers." Opt. News 11 (1985).
7. Fredkin, E., and T. Toffoli. "Conservative Logic." Int. J. Theor. Phys. 21
(1982):219.
8. Landauer, R. "Irreversibility and Heat Generation in the Computing Pro-
cess." IBM J. Res. & Devel. 5 (1961):183.
9. Margolus, N. "Physics-Like Models of Computation." Physica 10D (1984):81.
10. Margolus, N. "Quantum Computation." In the proceedings of a conference
"New Ideas and Techniques on Quantum Measurement Theory," Jan. 1986.
Ann. New York Acad. Sci. 480 (1986):487-497.
11. Margolus, N. "Physics and Computation." Ph.D. Thesis, Tech. Rep.
MIT/LCS/TR-415, MIT Laboratory for Computer Science, 1988.
12. Peres, A. "Reversible Logic and Quantum Computers." Phys. Rev. A 32
(1985):3266-3276.
13. Toffoli, T. "Computation and Construction Universality of Reversible Cellu-
lar Automata." J. Comp. Sys. Sci. 15 (1977):213.
14. Toffoli, T. "Cellular Automata as an Alternative to (Rather than an Ap-
proximation of) Differential Equations in Modeling Physics." Physica 10D
(1984):117.
15. Toffoli, T., and N. Margolus. Cellular Automata Machines: A New Environ-
ment for Modeling. Cambridge: MIT Press, 1987.
16. Zurek, W. H. "Reversibility and Stability of Information Processing Sys-
tems." Phys. Rev. Lett. 53 (1984):391.
W. G. Teich and G. Mahler
Institut far Theoretische Physik, Universitat Stuttgart, Pfaffenwaldring 57, 7000 Stuttgart 80
FRG
1. INTRODUCTION
In the last couple of decades an enormous progress has been achieved in the minia-
turization of hardware elements for computing devices based on conventional mi-
croelectronics. The 4-megabit chip is employed commercially meanwhile and the
dimension of individual elements has already reached the submicrometer regime.
But for a length scale of the order of the de Broglie wavelength (typically a few
nanometers), quantization effects become important. The miniaturization process,
which is limited so far by a lacking mastering of the technology to build the re-
spective structures (mask production, etching, etc.), is bounded by fundamental
physical constraints. In order to further reduce the size of the hardware elements
and to increase integration, new physical concepts of information processing must
be developed. Investigations in this direction can be summarized as "molecular
electronics."2 It is concerned with information processing systems where the ba-
sic elements have a dimension of a few nanometers and, therefore, possess typical
molecular properties like a discrete energy subspectrum. It is not limited to organic
macromolecules as a new class of substances, but includes possible realizations in
form of semiconductor heterostructures ("quantum-dots"") as well.
13>
W32
W31
FIGURE 1 Quantum optical
model for a single cell A. The
12> coupling between the states is
indicated by the "overlap" of the
11> V respective levels (cf. text)
Information Processing at the Molecular Level 293
11) 0 4031
12) 4032
a single trapped ion.9 In this case the corresponding coupling between the states
is due to symmetry-based selection rules. However, the various states of a single
trapped ion do not possess different dipole moments and, therefore, cannot be cou-
pled via the dipole-dipole interaction as described below.
Applications include an array of uncoupled cells which might be used as an
information storage device, similar to persistent hole burning.8 Storage capacities
of up to 109 bits/cm2 might be achieved in this way.11
PAPB
= h. w31(4) — h w31(5) = F(OA OB) (1)
41recoR3
13>
16>
It>
A R
FIGURE 2 Schematic drawing of two coupled cells A and B with different transition
frequencies, separated by a distance R.
to apply frequencies which can simultaneously lead to transitions in cell A and cell
B.
The machine table (Table 2) describes the possible control over the system. By
two simultaneous laser pulses with frequencies w31(4) and w31(5), it is possible to
prepare state 12) in cell A independent of the state of cell B. Similarly, all other
states can .be prepared independently. By applying a laser pulse with the single
frequency W32(4), the new state of cell A depends on the old states of cell A and
cell B. For suitable coding this mapping represents a logical "OR." Similarly, all
other elementary logical functions can be realized.13
6w of the laser pulse, which, however, must be smaller than the separation Ata
between different frequency bands. In this case the transition probability for each
cell depends only on the state of the adjacent cells. The required bandwidth can be
found to be13
6w ::-.2. 0.611w . (2)
In order to achieve distinct transition frequencies for each of the four possible
configurations of nearest neighbors, the frequency shift of the left and right neighbor
must be different. Due to the R73 dependence of the dipole-dipole interaction, this
can be realized by an asymmetric arrangement of the cells, i.e., the distance to the
left and the right neighbor is different. This symmetry breaking physically defines
a direction on the chain which, on the other hand, is necessary to get a directed
information flow in the system.
Since individual cells cannot be addressed selectively either in real space or
in frequency space, the preparation of the cellular structure must be performed
with the help of a shift operation. Starting from a physical inhomogeneity (i.e., a
cell D with transition frequencies distinct from cells A and B) any inhomogeneous
state can be prepared: a temporal pattern (successive preparations of cell D) is
transformed by successive shift operations into a spatial pattern of the cellular
structure (serial input).13 Similarly, the state of the 1-D cellular structure can be
measured.
41.•••••M•
FIGURE 3 Real space model for a linear chain of alternating cells A and B. The
distances to the left and right neighbor are in general different (RI 0 R2)
Information Processing at the Molecular Level 297
= 15> 7//////,
3
C chain
cell Ai
Bi-1 Eli ( 1) --4 12) 12) --' 11)
14) 14) W31(4,4) w32(4,4)
14) 15) "31(4,5) "32(4,5)
15) (4) w31(5,4) "32(5,4)
15) 15) w31(5,5) "32(5,5)
cell Bi
Ai Ai+1 14) --> IS) IS) --> 14)
11) 11) w64(1,1) W65(1,1)
11) 12) c‘,64(1,2) ca65(1,2)
12)11) (064(2,1) "65(2,1)
12) 12) "64(2,2) "65(2,2)
macroscopic length
1 1 I I _A F., I I I 1 15
SUMMARY
For the example of an optically controlled multistable quantum system, we have
demonstrated the connection between a complex hierarchical structure and the
complex dynamics of the system. Different length scales define various hierarchical
levels of the system (Figure 5). The number of hierarchical levels of the system (in
our case five) is a measure for the "homogeneous" complexity of the system. The
minimum number of five hierarchical levels is a prerequisite in order to realize mul-
tistability, preparation, measurement, and control, necessary to achieve a complex
dynamics which is equivalent to information processing. Neither a perfect crystal,
nor an ideal gas, which both possess only two hierarchical levels (a macroscopic
length scale and an atomic length scale), fulfill these requirements.
ACKNOWLEDGMENT
Financial support by the Deutsche Forschungsgemeinschaft (Sonderforschungsbere-
ich 329) is gratefully acknowledged.
300 W. G. Teich and G. Mahler
REFERENCES
1. Blum, K. "Density Matrix Theory and Applications." New York: Plenum
Press, 1981, 63.
2. Carter, F. L., ed. "Molecular Electronic Devices." New York: Marcel Dekker,
1982.
3. Deutsch, D. "Quantum Theory, the Church-Turing Principle and the Univer-
sal Quantum Computer." Proc. R. Soc. London A 400 (1985):97.
4. Ferry, D. K., and W. Porod. "Interconnections and Architecture for Ensemble
of Microstructures." Superlatt. Microstruct. 2 (1986):41.
5. Feynman, R. P. "Quantum Mechanical Computers. Opt. News 11 (1985):11.
6. Landauer R. "Irreversibility and Heat Generation in the Computing Pro-
cess." IBM J. Res. and Dev. 5 (1961):183.
7. Landauer R. "Fundamental Limitations in the Computational Process." Berichtt
der Bunsen-Gesellschaft fir Physikalische Chemie 80 (1976):1041.
8. Moerner, W. E., ed. "Persistent Spectral Hole-Burning: Science and Applica-
tions." Berlin: Springer-Verlag, 1988.
9. Nagourney, W., J. Sandberg, and H. Dehmelt. "Shelved Optical Electron
Amplifier: Observation of Quantum Jumps." Phys. Rev. Lett. 56 (1986):2797.
10. Obermayer K., G. Mahler, and H. Haken. "Multistable Quantum Systems:
Information Processing at Microscopic Levels." Phys. Rev. Lett. 58 (1987):1792.
11. Obermayer K., W. G. Teich, and G. Mahler. "Structural Basis of Multista-
tionary Quantum Systems. I. Effective Single-Particle Dynamics." Phys. Rev.
B37 (1988):8096.
12. Peres, A. "Reversible Logic and Quantum Computers." Phys. Rev. A32
(1985):3266.
13. Teich, W. G., K. Obermayer, and G. Mahler. "Structural Basis of Multista-
tionary Quantum Systems. II. Effective Few-Particle Dynamics." Phys. Rev.
B37 (1988):8111.
14. Teich, W. G., G. Anders, and G. Mahler. "Transition Between Incompati-
ble Properties: A Dynamical Model for Quantum Measurement." Phys. Rev.
Lett. 62 (1989):1.
15. Teich, W. G., and G. Mahler. "Optically Controlled Multistability in Nanos-
tructured Semiconductors." Physica Scripta 40 (1989):688.
16. Wolfram, S. "Theory and Applications of Cellular Automata." Singapore:
World Scientific, 1986.
17. Zurek, W. H. "Reversibility and Stability of Information Processing Sys-
tems." Phys. Rev. Lett. 53 (1984):391.
Tommaso Toffoli
MIT Laboratory for Computer Science, Cambridge, MA 02139
1. INTRODUCTION
One often speaks, generically, of 'the laws of physics.' The physicist, however, is
well aware that different kinds of laws have a different status, and according to
their status are meant to play a different role in both theory and applications. The
major status categories are roughly as follows.
n Analytical mechanics. Here we have physics' "constitution"—the principles of
classical mechanics, relativity, quantum mechanics. When we say, "Let's con-
sider a Hamiltonian of this form," we do not pretend that a physical system
governed by such law actually exists; we merely imply that the law would not
be struck down by physics' supreme court as "unconstitutional:"
n Fundamental processes. Here we have those physical interactions that are actu-
ally observed and that presumedly belong to nature's most fundamental reper-
toire. They are the "op-codes" (using Margolus' metaphors) which the Supreme
Architect actually decided to include in physics' machine language." We ten-
tatively assume that, as in the design of a computer chip, other choices of
op-codes were potentially available and could have been equally effective.
Of course, some "grand unified theory" may later show that what appeared to
be independent choices at the op-code level are actually forced consequences
of a single master choice. Moreover, we may realize that what we thought
was a primitive op-code is actually implemented as a higher-level construct—a
"subroutine call." But we are all familiar with this kind of issues from experience
with man-made worlds.111
n Statistical mechanics. Here we have laws that emerge out of the collective
behavior of a large number of elements. The quantities involved in these laws
may not even be meaningful for individual systems or experiments. Intuitively,
one may expect that almost every detail of the microscopic interactions will
be washed out by macroscopic averaging; only features that are supported by
a definite conspiracy (such as a particular symmetry or conservation law) will
bubble up all the way to the macroscopic surface and emerge as recognizable
statistical laws.
In the past few decades, an enormous range of complex physical phenomena
have been successfully explained as inexorable statistical-mechanical consequences
of known fundamental processes or plausible stylizations thereof. Without doubt,
the reduction of phenomenology to fundamental processes via statistical mechanics
is today one of the most productive paradigms (cf. Kuhn4) of mathematical physics.
Explaining the texture of mayonnaise has become a likely subject for ten articles
in Physical Review, and no one would be surprised if its mathematics turned out
to be isomorphic to that needed to explain the fine structure of quarks.
This work on collective phenomena has revealed principles that appear to have a
universal and fundamental character not unlike that of the principles of mechanics.
In this paper, we shall turn the tables and ask, "Are perhaps the very principles
of mechanics so universal and fundamental just because they are emergent aspects
of an extremely fine-grained underlying structure, and thus chiefly mathematical
rather than physical in contents?"
A coin, no matter what its composition, shape, or tossing technique, can be
characterized by a single real parameter k such that over a large number of trials
111The choice of op•codes for, say, the IBM/360 family of computers reveals strong constraints of
economy, consistency, and completeness. And in the cheapest models of this family many of the
op-codes documented in the machine-language manual are emulated by software traps rather than
directly implemented in hardware; the timing may be different, but the logic is identical.
How Cheap Can Mechanics' First Principles Be? 303
it will come up heads close to a fraction k of the time. The existence of such a
parameter is not usually regarded as a property of our physical world per se—a
choice made by God when establishing the laws of nature; rather, it is seen as
a mathematical consequence of almost any choice about physics God could have
made.
In the same vein, one would like to ask whether, for instance, the laws of
mechanics are symplectic because God explicitly decided to make them so, or
whether this symplectic character automatically follows out of virtually any reason-
able choice of fine-grained first principles. Similarly, can one think of simple ground
rules for physics whereby relativity would appear as little surprising as the law of
large numbers?
In this paper, we shall give some circumstantial evidence that questions of the
above kind are scientifically legitimate and intellectually rewarding. Namely, we'll
look at a number of physical concepts that are usually regarded as primitive, and
in each case we'll show a plausible route for reduction to much simpler concepts.
2. CONTINUITY
Both in the classical and the quantum description, the state of a physical sys-
tem evolves as a continuous function of time. In mathematics, it is well known that
(a) (b)
A. y p(x,0)
• • • • • • • 0 •
• • • • • • • • •
• • • • • • • • •
• . • • • •
• • • 1—
• • • • • • • • •
• , • • • • • • •
0
X 0 1 2 3 4 5 6 x
• • d • • • • • •
• • • • • • • • •
FIGURE 1 (a) Particles on a lattice. (b) Density plot along a line y = const.
304 Tommaso Toffoli
(a) (b)
A y
• • • . • • • • •
• • • • • • • • •
• • • * 0 • • •
S.
•
• • • • • • • •
• o • • • • • • •
•
• ••
•••• 1.0
•• • • • • •
•
• • INF
• • • I • • • • • •
lwan.m .....
• • • • • • • • •
certain discrete constructs (e.g., the distribution of prime numbers, ir(n)) can be
approximated by continuous ones (in this example, the Riemann function R(x)).
However, continuity does not invariably emerge from discreteness through some
universal and well-understood mechanism, so that, when it does, we are justified
in asking why. Once we understand the reasons in one case, we may hope to derive
sufficient conditions for its emergence in a more general situation. Here we'll give
an example of sufficient conditions in a kinematical context.
Consider an indefinitely extended two-dimensional lattice of spacing A, having
a 1 ("particle") or a 0 ("vacuum") at each site, as in Figure 1(a). As we move, say,
along the x axis, the microscopic density function p(x, y) will display the discon-
tinuous behavior of Figure 1(b).
Let us define a whole sequence A, of new density functions, with pn(x, y) de-
noting the average density over the square window of side nA centered at x, y.
For example, pa can take any of the 10 values 0,1/9, 2/9, ... , 8/9,1 (Figure 2(a)).
However, as x increments by A—and the corresponding window slides one lattice
position to the right—pa cannot arbitrarily jump between any two of these values;
the maximum size of a jump is 1/3 (Figure 2(b)).
In general, while pn depends on the number of particles contained in the entire
window (volume effect), the change Apn corresponding to Az = ±A depends only
on the number of particles swept by the edge of the window (surface effect); thus,
IAPn I < 22
n-i= • (1)
How Cheap Can Mechanics' First Principles Be? 305
and
lim Ap„ = 0 . (2)
n-•oo
If now we let the lattice spacing A decrease in the same proportion as n increases (so
that the area of the window remains constant), in the limit as n --> oo the sequence
Pn converges to a uniformly continuous function of x, y.
The above considerations involving a static configuration of particles are trivial.
Now, let us introduce an arbitrary discrete dynamics (r will denote the time spacing
between consecutive states), subject only to the following constraints:
n Locality. The state of a site at time t + r depends only on the state at time t
of the neighboring sites.
n Particle conservation. The total number of particles is strictly conserved.
In one time step, only particles that are lying next to the window's border can
move in or out of the window: much as that on x, the dependency of pn on t as well
is a surface effect. If in taking the above limit we let the time spacing r shrink in
the same proportion as the lattice spacing A, so as to leave the "speed of light" (one
site per step) constant, the sequence pn(r, y; t) converges to a uniformly continuous
function oft.
Remark that, if either locality or particle conservation did not hold, pn as
a function of time would not, in general, converge to a definite limit. Thus, we
have characterized a situation where the emergence of a continuous dynamics is
reducible to certain general properties of a (conceptually much simpler) underlying
fine-grained dynamics.
Is that the store where physics buys continuity? Who knows—but Occam would
say it's a good bet!
3. VARIATIONAL PRINCIPLES
In order to explicitly construct the evolution of an arbitrary dynamical system over
an indefinitely long stretch of time one needs laws in vectorial form. In the time-
discrete case, a vectorial law gives the next state ut4.1 of the system as a function
of the current state ut ,
u t +1 = Put; (3)
though in many cases of interest F can be captured by a more concise algorithm,
full generality demands that F be given as an exhaustive lookup table, since its
values for different values of u can in principle be completely arbitrary.
In the continuous case, a vectorial law gives the rate of change of the current
state u(t),
dt u = fu(t); (4)
306 Tommaso Toffoli
This compression factor of 2 (of 2n, for n degrees of freedom) is attained at a cost.
To obtain the current value of dq/dt, it is no longer enough to look at a single entry
of a table, as in Eq. (5); in fact, one has to determine the trend of H for variations
of p in the vicinity of the current value of (q, p), and this entails looking up a whole
range of entries. Compressed data save memory space, it is true, but entail more
computational work.
FIGURE 3 An energy variation, dE, and the corresponding action variation, dS.
1 2 3 4 5 6 7 8 9 1011 • -
dS
dT = T n(T) . (8)
Therefore, the original relation (7) will hold if and only if the orbit-length
distribution is of the form
n(T) =1 (10)
1
nN(T) = (12)
for any N.
In fact, we construct a specific orbit of length T by choosing T states out
of N and arranging them in a definite circular sequence. This can be done in
(N )TI/T
•
different ways. To know in how many elements of the ensemble the orbit
thus constructed occurs, we observe that the remaining N — T elements can be
connected in (N — T)! ways. Thus, the total number of orbits of length T found
anywhere in the ensemble is
1 NNT!
(13)
•(N T)! NI 'T
1•
Given two points a and b, one can tell whether b can be reached from a in
t steps; in particular (for t = 0), one can tell whether or not a = b.
independent of the labeling of the points, and thus preserved by any isomorphism.
310 Tommaso Toffoli
Thus, for instance, one can tell how many orbits of period T are present, but of these
one cannot single out an individual one without actually pointing at it, because they
all "look the same."
To see whether there is a quantity that can be meaningfully called "energy"
in this context, let us observe that physical energy is a function E, defined on the
state space, having the following fundamental properties:
1. Conservation. E is constant on each orbit (though it may have the same value
on different orbits).
2. Additivity. The energy of a collection of weakly coupled system components
equals the sum of the energies of the individual components.
3. Generator of the dynamics. Given the constraints that characterize a particular
class of dynamical systems, knowledge of the function E allows one to uniquely
reconstruct, up to an isomorphism, the dynamics of an individual system of
that class.
The proposed identification E = log T obviously satisfies property 1.
As for property 2, consider a finite system consisting of two independent compo-
nents, and let 00 and al be the respective states of these two components. Suppose
for a moment that ao is on an orbit of period 3, and al on one of period 7; then the
overall system state (ao, al) is on an orbit of length 21, i.e., log T = log To +10g Ti .
This argument would fail if To and T1 were not coprime. However, for randomly cho-
sen integers the expected number of common factors grows extremely slowly with
the size of the integers themselves,7 so that approximate additivity holds almost
always.
As for property 3, an individual system of XN is completely identified—up to
an isomorphism—by its distribution n(T), and thus any "into" function of T (in
particular, E = log T) satisfies this property.
Note that the ensemble XN consists of all invertible systems on a state space
of size N. If we placed further constraints on the make-up of the ensemble, i.e.,
if we restricted our attention to a subset of systems having additional structure,
some of the above arguments may cease to be valid. For example, while it is true
that for large N almost all subensembles of XN retain distribution (Eq. (12)), in
a few "perverse" cases the distribution will substantially depart from 1/T, and, if
we still assume that E = log T, Eq. (7) may fail to hold. Moreover, systems that
were isomorphic within XN may no longer be so when more structure is introduced;
to allow us to tell that two systems are intrinsically different, the energy function
may have to be "taught" to make finer distinctions between states than just on the
basis of orbit length. But all this is besides the point we are making here; a fuller
discussion of these issues will be found in Toffoli.1°
How Cheap Can Mechanics' First Principles Be? 311
3.3 CONCLUSIONS
The fact that a specific variational principle of mechanics emerges quite naturally,
via statistical averaging, from very weak information-mechanical assumptions, does
not tell us much about what fine-grained structure, if any, may actually underlie
traditional physics; the relevant point is that we come to recognize that such a
principle happens to be of the right form to be an emergent feature. When we see
a Gaussian distribution in a sequence of heads and tails, we can't really tell what
coin is being tossed, but conceptual economy will make us guess that somebody is
tossing some kind of coin, rather than concocting the sequence by explicit use of
the Gaussian function.
4. RELATIVITY
The fact that the physics of flat spacetime is Lorentz, rather than Galilean, invariant
is usually treated as an independent postulate of physics, much as Euclid's fifth
axiom in geometry. In other words, God could have chosen differently; Lorentz
invariance has to be acknowledged, not derived.
However, if we look at the most naive models of distributed computation, we
see that Lorentz invariance naturally emerges as a statistical feature, and admits
of a very intuitive information-mechanical interpretation. Much as in the previous
section, we do not want to claim that this is the way relativity comes about in
nature; we just want to stress that the mathematics of relativity happens to lie in
one of those universality classes that arise from collective phenomena.
4.1 ORIENTATION
Consider the two-dimensional random walk on the x, y lattice. At the microscopic
level, this dynamics is not rotation invariant (except for multiples of a quarter-turn
rotation); however, invariance under the continuous group of rotations emerges at
the macroscopic level (Fig. 5). In fact, for r2 = z2 + y2 < t and in the limit as
t cc), the probability distribution P(x,y;t) for a particle started at the origin
converges to
(14)
2irte3 t
i.e., depends on x and y only through x2 + y2 = r2.
Now, there is a strict formal analogy between a circular rotation by an angle 0
in the x, y plane and a Lorentz transformation with velocity 13 in the t, x plane—
which can be written as a hyperbolic rotation by a rapidity B = tanh-1 13:
rt r cosh 0 sinh 01 r
xj (15)
smh cosh L
312 Tommaso Toffoli
Riding on this analogy, one may hope to find a microscopic dynamics on the t, x
lattice for which Lorentz invariance (which is out of the question at the microscopic
level) would emerge at the macroscopic level.
Let's look first at the one-dimensional random walk on a lattice, with probabil-
ity p of moving to the right and q = 1—p of moving to the left. For p = q = 1/2, the
evolution of the resulting binomial distribution is characterized, macroscopically, by
a mean p = 0 and a standard deviation o = 011 (Figure 6(a)).
In general, p = (p — q) t . If we shift the parameter p away from its center value
of 1/2, the center of mass of the distribution will start moving at a uniform velocity
fl = p— q. Let's try to offset this motion by a Galilean transformation
(a) (b)
FIGURE 6 (a) Symmetric random walk (p = 1/2). (b) Asymmetric random walk
(p = 3/4); note that, as the center of mass picks up a speed /3 = p — q, the rate of
spread goes down by a factor 1 — )32.
Macroscopically, the new system will evolve, in the new frame, just as the old
system did in the old frame—except that now o = N5Ft = V(1— 132)t/4, so that
the diffusion will appear to have slowed down by a factor 1— fi'2 (Fig. 6(b)).
Intuitively, as some of the resources of the "random walk computer" are shifted
toward producing coherent macroscopic motion (uniform motion of the center
of mass), fewer resources will remain available for the task of producing inco-
herent motion (diffusion). Thus, we get a slowdown reminiscent of the Lorentz-
Fitzgerald "time expansion." In the present situation, however, the slowdown factor
is 1— (32, related to, but different from, the well-known relativistic factorVi.----73/;
the transformation that will restore invariance of the dynamics in this case is
a Lorentz transformation followed by a scaling of both axes by a further factor
FL 73 3
p and a arise in this model from changes in the initial distribution of microscopic
states, rather than by tampering with the microscopic laws. This model is exactly
Lorentz invariant in the continuum limit, i.e., as the lattice spacing A goes to zero.
Let us consider a one-dimensional cellular automaton having the format of
Fig. 7(a). This is a regular spacetime lattice, with a given spacing A (lattice units
per meter). The arcs represent signals traveling at unit speed (the "speed of light");
the nodes represent events, i.e., interactions between signals. If one of the possible
signal states, denoted by the symbol 0, is interpreted as denoting the vacuum, the
remaining states can be interpreted as particles traveling on fixed spacetime tracks
(the arcs) and interacting only at certain discrete loci (the nodes). Such a system
can be thought of as a lattice gas (cf. Hardy et al.3 and Toffoli and Margolus8).
Here, we will allow no more than one particle on each track. When two particles
collide, each reverses its direction (Fig. 7(b)). As long as particles are identical (say,
all black), this reversal is indistinguishable from no interaction (Fig. 7(c)).
Now let us paint just one particle red (in which case the reversal does make
a difference), and study the evolution of its probability distribution p(x; t) when
both right- and left-going particles are uniformly and independently distributed
with linear density (particles per meter) s = n/A—where n is the lattice occupation
density (particles per track).
(a) (b)
X
(c)
FIGURE 7 (a) One-dimensional lattice, unfolded over time. The tracks, with slope ±1,
indicate potential particle paths; the nodes indicate potential collision loci. (b) Bouncing
collision. (c) No-interaction collision.
How Cheap Can Mechanics' First Principles Be? 315
1
atP (18)
2s
in the same circumstances (i.e., t -.4 co, Izi < Vt.) as the binomial distribution does.
We shall now introduce the freedom to independently vary the densities s+ , s_ of,
respectively, right- and left-going particles; as a consequence, the red particle's
distribution's center of mass will drift, and its diffusion rate will be affected, too—
much as in the asymmetric random walk case. However, this time we have strict
Lorentz invariance (in the continuum limit): to every Lorentz transformation of the
coordinates, t, x 1-4 t', x', there corresponds a similar linear transformation of the
initial conditions, 3+, s_ i s'+, s' , that leaves the form of p invariant. (Indeed, the
telegrapher's equation is just another form of the Klein-Gordon equation used in
relativistic quantum mechanics.)
Lorentz invariance emerges in a similar way for a much more general class of
dynamics on a lattice, as explained in Toffoli9; more generally, features qualitatively
similar to those of special relativity appear whenever fixed computational resources
have to be apportioned between producing the inertial motion of a macroscopic
object as a whole and producing the internal evolution of the object itself (cf.
Chopard2). Thus, we conjecture that special relativity may ultimately be derived
from a simpler and more fundamental principle of conservation of computational
resources.
Is this really an unfortunate state of affairs? After all, we know that physics
itself starts deviating from special relativity when one dumps more and more matter
in the same volume. Are we witnessing the emergence of general relativity? Indeed,
the slowdown of the macroscopic evolution brought about, in models of the above
kind, by the "crowding" of the computational pathways, is strikingly analogous
to the proper-time dilation that, in physics, is brought about by the gravitational
potential.
Without more comprehensive models, precise interpretation rules, and quan-
titative results, any claims that the present approach might have anything to do
with modeling general relativity is, of course, premature. But it is legitimate to ask
whether fine-grained computation in uniform networks has at least the right kind
of internal resources for the task. In other words, is the emergence plausible, in
such systems, of a dynamics of spacetime analogous to that described by general
relativity? And how could it come about?
Let us start with a metaphor. On a strip of blank punch tape we can record
information at a density of, say, ten characters per inch. What if we could only avail
ourselves of used tape, found in somebody's wastebasket? Knowing the statistics of
the previous usage, one can devise appropriate group encoding techniques and error
correcting codes so as to make such a used tape perfectly adequate for recording
new information (cf. Rivest and Shamire)—at a lower density, of course, i.e., up
to the maximum density allowed by Shannon's theorems for a noisy channel. The
proper length of the tape, defined in terms of how many characters we can record on
it, will be less than that of blank tape, by a factor that will depend on how heavy
the original usage was. If the tape is sufficiently long, its statistical properties may
significantly vary from place to place, and we may want to adapt our encoding
strategy to the local statistics—yielding a proper-length metric that varies from
place to place.
Let us extend the above metaphor from the domain of information statics to
that of information dynamics. Consider, for example, a programmable gate array
having a nominal capacity of, say, 10,000 gates. An inventor designs a clever arcade
game that takes full advantage of the chip's "computing capacity," and asks the
VLSI factory to produce a million copies of it. The game turns out to be a flop, and
the programmed chips get thrown in the waste basket. What is the effective "com-
puting capacity" of these chips from the viewpoint of the penniless but undaunted
hacker that finds them? How many of these chips would he have to put together in
order to construct his own arcade game, and how many clock cycles of the original
chip would he have to string together to achieve a usable clock cycle for his game?
What in the new game is simply the toggling of a flip-flop may correspond, in the
underlying original game, to the destruction of a stellar empire. For the new user,
proper time will be measured in terms of how fast the evolution of his game can be
made to proceed.
For a macroscopic scavenger, the individual hole positions in a punched tape or
the individual gates in an electronic circuit blend into a continuum, locally charac-
terized by a certain effective density of information-storage capacity and a certain
How Cheap Can Mechanics' First Principles Be? 317
4.4 CONCLUSIONS
Quantitative features of special relativity and at least qualitative features of general
relativity emerge quite naturally as epiphenomena of very simple computing net-
works. Thus, relativity appears to be of the right form to be an emergent property,
whether or not that is the way it. comes about in physics.
5. GENERAL CONCLUSIONS
Many of what are regarded as the most fundamental feature of physics happen
to have the right form to be emergent features of a much simpler fine-grained
dyna.mics.141
A century and a half ago, most people were happy with the idea that the cell
was a bag of undifferentiated "protoplasm" governed by some irreducible "vital
force." The behavior of the cell was obviously very rich, but few people dared to
ascribe it to much finer-grained internal machinery, explicitly built according to
immensely detailed blueprints.
Today we know for sure about the existence of such machinery and such
blueprints. Besides molecular genetics, chemistry and nuclear physics provide fur-
ther case histories where complex behavior was successfully reduced to simpler
primitives on a grain a few orders of magnitude finer.
For a physicist, the possibility of explanation by reduction to simpler, smaller
structures is of course one of the first things that comes to mind. The point of
this paper is that one should look for such possibility not only to explain specific
phenomenology, but also to re-examine those general principles that are so familiar
that no "explanation" seems to be needed.
NEven invertibility —perhaps the most strongly held feature of microscopic physics—can quite
naturally emerge out of an underlying noninvertible dynamics. We are going to discuss this topic
in a separate paper.
318 Tommaso Toffoli
ACKNOWLEDGMENTS
This research was supported in part by the Defense Advanced Research Projects
Agency (N00014-89-J-1988), and in part by the National Science Foundation
(8618002-IRI).
REFERENCES
1. Arnold, Vladimir. Mathematical Methods of Classical Mechanics. Berlin:
Springer-Verlag, 1978.
2. Chopard, Bastien. "A Cellular Automata Model of Large-Scale Moving Ob-
jects." Submitted to J. Phys. A (1989).
3. Hardy, J., 0. de Pazzis, and Yves Pomeau. "Molecular Dynamics of a Clas-
sical Lattice Gas: Transport Properties and Time Correlation Functions."
Phys. Rev. A13 (1976):1949-1960.
4. Kuhn, Thomas. The Structure of Scientific Revolutions, 2nd edition.
Chicago: Univ. of Chicago Press, 1970.
5. Margolus, Norman. "Physics and Computation" Ph.D. Thesis, Tech. Rep.
MIT/LCS/TR-415, MIT Laboratory for Computer Science, 1988.
6. Rivest, Ronald, and Adi Shamir. "How to Reuse a `Write-Once' Memory."
Info. and Control 55 (1982):1-19.
7. Schroeder, Manfred. Number Theory in Science and Communication, 2nd en-
larged edition. Berlin: Springer-Verlag, 1986.
8. Toffoli, Tommaso, and Norman Margolus. Cellular Automata Machines—A
New Environment for Modeling. Cambridge: MIT Press, 1987.
9. Toffoli, Tommaso. "Four Topics in Lattice Gases: Ergodicity; Relativity; In-
formation Glow; and Rule Compression for Parallel Lattice-Gas Machines."
In Discrete Kinetic Theory, Lattice Gas Dynamics and Foundations of Hydro-
dynamics, edited by R. Monaco. Singapore: World Scientific, 1989, 343-354.
10. Toffoli, Tommaso. "Analytical Mechanics from Statistics: T = dS/dE Holds
for Almost Any System." Tech. Memo MIT/LCS/TM-407, MIT Laboratory
for Computer Science, August 1989.
Xiao-Jing Wang
Center for Studies in Statistical Mechanics, University of Texas, Austin, TX 78712 (current
address: Mathematical Research Branch, NIDDK, National Institutes of Health, Bldg. 31,
Room 4B-54, Bethesda, MD 20892, USA)
I. INTRODUCTION
We shall summarize here succinctly some recent progress in our understanding of
intermittent phenomena in physics. Intermittency often refers to random, strong
deviations from regular or smooth behavior. Consider, for instance, an iterative
dynamical system (Figure 1)
Xn+1
Xn
0 1
... A3 A2 Al A0
FIGURE 1 The Manneville-Pomeau map, with a countable partition of the phase space
(0,1).
and the behavior is then said to be chaotic. An entropy is also well defined for
dynamical systems, thanks to A. Kolmogorov and Y. Sinai.6 The idea is that a
deterministic chaotic system admits a discrete generating partition of its phase
Intermittent Fluctuations and Complexity 321
which implies An /72 0, and the dynamic stability is stretched exponential rather
than exponential. This kind of behavior was called "sporadic."8 We shall show that
it represents a special class of intermittent systems, with the algorithmic complexity
of Kolmogorov and Chaitin of a form intermediate to predictable and random cases.
A fully disordered or random system is sometimes perceived as simple rather
than complex, mostly when its fluctuations appear small and inconspicuous. On the
other hand, the intermittency is generally characterized by abnormal fluctuations
and 1/f-noise-like power spectrum,1'9 and local fluctuations around the mean of
an observable may obey a Levy, rather than a Gauss, distribution. More complete
information is provided by knowledge about large fluctuations, using the thermo-
dynamic formalism of Sinai, Ruelle and Bowen for the dynamical systems.16 The
SRB theory furnished a rigorous connection between dynamical systems and equi-
librium statistical mechanics in one dimension (on the time axis). In this framework
abnormal behaviors of large deviations in Eq. (1) are to be treated as a problem of
phase transition in its statistical mechanical counterpart.2°,21
To sum up, one can look at the mean, local fluctuations, large deviations, and
equilibrium statistical mechanics, in order to achieve increasingly detailed descrip-
tions of intermittent systems. Furthermore, one can also study its unusual nonequi-
librium properties, following a suggestion of G. Nicolis, et al.14,15 In what follows
we shall take Eq. (1) for a case study to illustrate how each of these levels of con-
sideration leads to insights into new aspects of fluctuations and complexity of the
intermittent processes. Sections are devoted to the equilibrium properties of
Eq. (1); and in Section IV nonequilibrium is discussed. Some other examples with
close analogy to the intermittent system will be mentioned in Section V, including
a discrete one-dimensional model of Anderson localization in disordered matter.
Initially observed in fluid turbulence, intermittency has more recently been
evidenced in other physical systems as diverse as the large-scale structure of the
universe and the hadronic multiparticle production in high-energy physics. Little is
known, as yet, beyond the phenomenology of these spatially extended processes.
322 Xiao-Jing Wang
This "predictable" class of strings includes periodic ones (for which it suffices to
specify the pattern of one period in the program), as well as a large set of aperiodic
ones. An often-cited example is the digital expansion of the mathematical constant
Pi, r = 3.141592... for which a convergent series representation exists (e.g., in
the sum of S. Ramanujan's series each successive term adds roughly eight correct
digits).
On the other hand, for a random sequence where no regularity can be found,
one could only "copy" the whole string, bit by bit, so that
with the transition probability pon , and the invariant measure pn , fulfilling
1 1
Pon ^'n—on ni+a; µ(An) •-.n (10)
na •
with ski_, = 0. Using the theory of recurrent events, one can then show that
n 2/3 < z < 2;
/if
E(K (Sn )) ••••• nTIT if 2 < z; (14)
io
n if z = 2.
For z < 2/3, there exists a central limit theorem asserting the Gaussian charac-
ter of the fluctuations. For 2/3 < z, on the other hand, they obey a generalized limit
theorem involving the Levy stable distribution gc,(x) with 0 < a < 2.13 (gc,.2(x) is
the familiar Gauss law. A Levy distribution with a < 2 enjoys a cognate genericity
as a Gauss distribution for sums of independent random variables with a common
distribution, only now this latter distribution has an infinite second moment.)
In both cases, the correlation function is a power law. This tells us that local
fluctuations are not sufficient to characterize the abnormality of the system. In fact,
a central limit theorem is concerned with fluctuations of the form
-1 n-1
E g(xk ) — E(g) < as ;
n k=0 /1
—oo < c < +co . (18a)
where E(g) stands for the mean value of an observable g(z). Instead of (18a), one
can consider large deviations, i.e.,
1 n-1
— Azk) E (A, A + dA) (18b)
k=0
for all possible values A, not necessarily near its mean value. If g(x) = logjr(z)1,
then the left side of Eq. (18b) tends to the Liapounov exponent .A, and one is dealing
with
n-1
Un(SoSiS2 • • -Sn-1) = if
n E
,
z€4 ,0318 2-sn-i) k=0
in.f (k)(0))i • (19)
where /(sosis2... sn_i) is the cell in the phase space coded by (s05132 • • • sa-1).
A fundamental result in the SRB theory states that the invariant measure of a
dynamical system is given by
A detailed analysis of the intermittent system (1) (or rather a piecewise linear
approximation of it) is carried out in Wang 20'21 Let us call attention to some main
conclusions therefrom. It was found that its statistical mechanical counterpart bears
a close analogy to a droplet model of condensation proposed by Fisher? about 25
years ago. The clusters of "laminar" states are similar to the clusters of particles
(the droplets) in Fisher's model of gas-liquid phase transition; there are many-body
interactions within each cluster, which results in a surface energy of logarithmic
type.
On the pressure-temperature plane, there are two thermodynamic phases sep-
arated by a critical line. They are respectively the chaotic ("gas") and periodic
("condensed") states; and the intermittent state is located on the co-dimension one
critical curve of phase transition. This is true for all 1 < z, regardless of whether
the local fluctuations are Gaussian or not. Therefore, the abnormal large fluctua-
tions may be detected as a phase transition of the associated statistical mechanical
system. The identification of the interaction potential and the types of resulting
critical phenomena constitute a finest characterization and universal classification
of such intermittent dynamical systems.
where denotes the probability of first recurrence at time n of the state i, has a
radius of convergence greater than unity.
Applying this theorem to the state A0 in our case, with the probability of first
recurrence given by Pool-1), Pon •••• 1/n—(1+a) [Eq. (10)] immediately implies that
the radius of convergence of the sum (21) is unity. Hence, one concludes that the
convergence to equilibrium is slower than any exponential law.
326 Xiao-Jing Wang
Let us indicate why this might be expected from the viewpoint of the spectral
properties of the transition matrix (9). According to the Perron-Frobenius theory,17
a finite nonnegative matrix, say, (aii), i, j = 1,2, ... , m, possesses a unique maxi-
mum eigenvalue Ao such that
E E
rn
min < Ao < max aij . (22)
-
jcl j=i
For the transition matrix of a finite Markov chain, this sum is Ei p,3 s 1. Thus,
A0 = 1, and all the other eigenvalues have a modulus less than 1. Any initial distri-
bution will then be projected onto the (non-negative) eigenvector associated to Ao,
i.e., the invariant measure, and all the other components vanish in an exponential
fashion.
Now, for a countable Markov chain, the transition matrix has denumerably
infinite eigenvalues, so that the eigenvalue A0 = 1 may be approached arbitrarily
by other eigenvalues. This is indeed the case for the model of intermittency. Let us
sketch the argument. Truncating the transition matrix (9) up to n, one obtains a
finite matrix Wn of which the characteristic equation is
-1-
Fn(A) = r - Poor Poi r-2 - • • • - Po(n--2)A - Po(n-1) = 0 (23)
with lim„....en Fn(A0 = 1) = 0. It follows from the Perron-Frobenius theory that all
the roots in Eq. (23) are confined inside the unit disc.
Considering F,,(x) in Eq. (23) as the partial sum of an infinite series, one can
readily see that the radius of convergence of this series is 1. Now it is useful to
invoke a remarkable theorem in analysis, due to It. Jentzsch,18 which asserts that
for every power series, every point of the circle of convergence is a limit-point of
zeros of partial sums. Hence, the Ao = 1 of the countable chain is a limit-point of
roots in Eq. (23). On the other hand, it is reasonable to assume that as n oo,
every such root is arbitrarily near to one of the true eigenvalues of the infinite matrix
(9). One concludes therefore that A0 = 1 is not isolated. This suggests, although
it does not ensure, that the approach to equilibrium may not have an exponential
rate.
V. CONCLUDING REMARKS
There is an increasing number of systems to which the present work appears rele-
vant. Such a case cited in Gaspard and Wane is the Markov model defined on a
tree that was proposed by J. Meiss and E. Ott for Hamiltonian chaos, in the pres-
ence of a hierarchy of "cantofi." Another example is a model of abnormal diffusion
in resistively shunted Josephson junctions, which takes a form similar to Eq. (1).9
Perhaps more surprisingly, it has been noticed3 that a discrete one-dimensional
Intermittent Fluctuations and Complexity 327
model of Anderson localization seems also somewhat akin to the intermittent map
Eq. (1). Let us end this paper with a few remarks on this intriguing finding.
The one-dimensional Anderson model is a 1-D discrete Shrodinger equation
with a random potential Mb
All the states within the pure energy band [-4, 0] are localized in the presence
of a random potential {Vn}, with the inverse localization length directly given by
the Liapounov exponent of the map (25). For Vn FL"- 0, the map (25) is displayed in
Figure 2, with loglf(Rn)1= —2 log iRn j. Locally around x = 1, the mapping at the
band edge E = 0 looks the same as Figure 1, with z = 2, and it is of great interest
to recognize such a resemblance between Anderson localization and intermittency.
There are, however, notable differences which perhaps should not be overlooked.
Contrary to intermittent systems, here the mapping is invertible, and the Liapounov
exponent is always zero even for 0 < E, if the stochastic term is absents (obviously,
no localization is possible without random potential). Besides, due to the linear
character of Eq. (24), the Thouless formulan asserts a direct relationship between
the inverse localization length and the integrated density of states. Interpreted in
terms of the dynamics in Eq. (25), the former is the Liapounov exponent while the
latter, being related to the number of nodes of the wave function, is also the number
of times that Rn is negative in the lattice, hence assimilable to the "turbulent time"
(cf. Figure 2). Such a "dispersion relation" between the Liapounov exponent and
the turbulent time, however, does not seem to exist for the nonlinear intermittent
system: it would lead to the erroneous conclusion that the Liapounov exponent in
the latter case is also identically zero in the absence of noise.
On the other hand, the thermodynamic description of Eq. (1)21 does provide
a connection between the entropy S (here equivalent to the Liapounov exponent)
and the density of the laminar phase p (i.e., one minus the fraction of the turbulent
time). Both S and p are functions of the two thermodynamic variables f3 (inverse
of temperature) and ji (chemical potential), and are related one to the other by the
fundamental equation of thermodynamics
FIGURE 2 The map Eq. (25) in the absence of the noise term. It arises from an one-
dimensional Anderson model.
ACKNOWLEDGMENT
It is a pleasure to thank warmly Professor G. Nicolis for his continuous help, en-
couragement, and fruitful correspondence. This work was partly supported by the
Department of Energy under contract number DE-AS05-81ER10947. Sincere thanks
are also due to the Center for Statistical Mechanics at University of Texas for fi-
nancial support of my attendance at the Santa Fe Institute Workshop.
Intermittent Fluctuations and Complexity 329
REFERENCES
1. Ben-Mizrachi, A., I. Procaccia, N. Rosenberg and A. Schmidt. "Real and Ap-
parent Divergences in Low-Frequency Spectra of Nonlinear Dynamical Sys-
tems." Phys. Rev. A31 (1985):1830-1840.
2. Berge, P., Y. Pomeau and Ch. Vidal. L'ordre dans le Chaos. Paris: Herman,
1984.
3. Bouchaud, J. P., and P. Le Doussal. "Intermittency in Random Optical Lay-
ers at Total Reflection." .1. Phys. A: Math 19 (1986):797-810.
4. Chaitin, G. Algorithmic Information Theory. Cambridge: Cambridge Univer-
sity Press, 1987.
5. Derrida, B., and E. Gardner. "Lyapounov Exponent of the One-Dimensional
Anderson Model: Weak Disorder Expansion." J. Physique 45 (1984):1283-
1295.
6. Eckmann, J.-P., and D. Ruelle. "Ergodic Theory of Chaos and Strange At-
tractors." Rev. Mod. Phys. 57 (1985):617-656.
7. Fisher, M.E. "The Theory of Condensation and the Critical Point." Physics,
(published in Great Britain) 3 (1967):255-283.
8. Gaspard, P., and X.-J. Wang. "Sporadicity: Between Periodic and Chaotic
Behaviors." Proc. Natl. Acad. Sci. 85 (1988):4591-4595.
9. Geisel, T., J. Nierwetberg, and A. Zacherl. "Accelerated Diffusion in the
Josephson Junctions and Related Chaotic Systems." Phys. Rev. Lett. 54
(1985):616-619.
10. Kendall, D. G. "Geometric Ergodicity and the Theory of Queues." In Mathe-
matical Methods in the Social Sciences, edited by K. J. Arrow, S. Karlin, and
P. Suppes. Palo Alto: Stanford University Press, 1960,176-195.
11. Kolmogorov, A .N. "Combinatorial Foundations of Information Theory and
the Calculus of Probabilities." Russian Math. Surveys 34 (1983):29-40.
12. Mandelbrot, B. B. "Sporadic Random Functions and Conditional Spectral
Analysis: Self-Similar Examples and Limits." In Proceedings of the Fifth
Berkeley Symposium on Mathematical Statistics and Probability, edited by L.
LeCam and J. Neyman. Berkeley: University of California Press, 1967,155-
179.
13. Montroll, E. W., and B. J. West. "On an Enriched Collection of Stochatic
Processes." In Fluctuation Phenomena, edited by. E. W. Montroll and J. L.
Lebowitz. Revised edition. Amsterdam: North-Holland, 1987,61-206.
14. Nicolis, G., and C. Nicolis. "Master-Equation Approach to Deterministic
Chaos." Phys. Rev. A38 (1988):427-433.
15. Nicolis, G., C. Nicolis and E. Tirapegui. "Chaotic Dynamics, Markovian
Coarse-Graining and Information." Preprint, University of Brussels, 1989.
16. Paladin, G., and A. Vulpiani. "Anomalous Scaling Laws in Multifractal Ob-
jects." Phys. Rep. 156 (1987):147-225.
17. Seneta, E. Non-Negative Matrices. New York: John Willey Sr Sons, 1973.
330 Xiao-Jing Wang
We simply do not know how well the brain tackles computational problems of
varying degrees of complexity.
How perceptive are we? Suppose that it could be established that the visual
system performs optimally. Then we can go on to ask what sort of computation
is necessary in order to achieve this level of performance, and to explore in detail
the types of design that may be required.
2. In the perception literature, a number of models of visual processing have been
proposed. We raise the question of how such models can be falsified by ex-
periment. Obviously, it is of prime importance to progress from qualitative
discussions to quantitative tests, in order to determine whether a given model
is in fact viable. We try to compute the performance that the visual system is
capable of according to each of these models. If this performance falls signifi-
cantly below the experimentally measured performance, then clearly the model
can be ruled out. A large class of perceptual models corresponds to the steepest
descent or mean field approximation in our formulation. An important issue is
whether this approximation is adequate in explaining human performance.
PERFORMANCE
OPTIMAL PERFORMANCE
To discuss how well the visual system performs, we must immediately raise an
obvious question: What are we to compare the performance of the visual system
with? The only natural standard, it appears to us, is the optimal performance
allowed by information theory, that is, the performance attainable if every bit of
information received by the visual system is used.
First, we must choose a "naturalistic" task well suited to the visual system but
which also allows a precise mathematical formulation amenable to rigorous analysis.
Since we suspect that the visual system can in fact perform at or near optimum,
we also want the task to be computationally difficult. We chose the discrimination
between patterns with noise and distortion added.111
More precisely, we propose an experiment in which the subject is first ac-
quainted with two patterns, described by 00(x) and .01 (x). Here x denotes the co-
ordinates of the two-dimensional visual field. A black and white pattern is described
by a scalar field (k(x) where 10(x) is equal to the contrast, that is, the logarithm of
the intensity of the pattern at the point x.
For each trial, the experimenter chooses either 00 or 01 , with equal probability,
say. Suppose (ko is chosen. Then the pattern is distorted and obscured with noise
NExperiments similar to (but simpler than) the ones proposed here have been done by Barrow 2'3
Also see Bialek and Zee.4t7
Information Processing in Visual Perception 333
so that the subject actually sees (k(x) = (ko(y(x)) + 0(x). Here x y(x) defines an
arbitrary one-to-one mapping of the plane onto itself. The noise 0(x) is taken for
simplicity to be Gaussian and white. Thus, the conditional probability of seeing (k(x)
were 00 chosen is given by P(0100) = (1/Z) f Dy e-w(I)-P f (ezfigx)- 0(v(z)N
4.. 2 The
functional W(y) should favor gentle distortions, for which y(x) x. More on W
later. Here Z is a normalization factor required by f P(0100 ) = 1. Henceforth,
we will often neglect to write the normalization factor. Evidently, a probability
P(0101 ) can also be defined by substitution. The subject is to decide whether the
pattern seen corresponds to 00 or (h.
The patterns 00 and 01 should be abstract so as to eliminate any possible
biological or cultural bias, such as those associated with our finely developed ability
to recognize human faces. We are interested in perception rather than cognition.
DISCRIMINABILITY
The information-theoretic optimal performance can then be computed according
to standard signal detection theory. A particularly relevant quantity is the discrim-
inability. Define the discriminant as A(0; 4,o, 44) = log[P(000)/P(0101)]. With this
definition, the discriminant is positive when the probability P(440) is larger than
P(0141) and negative when the opposite holds. (The logarithmic form for the dis-
criminant is chosen for convenience. Some other monotonic function of the ratio of
the two probabilities P(0100) and P(014)i) may serve equally well.) It can be shown
that optimal discrimination is accomplished by maximum likelihood. In plain En-
glish, the optimal strategy is to identify the pattern as oo if A(4); 4)o, 01) is positive,
and as eh if A(0; 4)o, (h) is negative. This is, of course, precisely the strategy that
any sensible person capable of knowing A will adopt. Having seen the image OW,
we have to decide whether it is more likely that the image "came" from 00(x) or
from ch(x). (Thus, the experiment implies a "learning phase" in which the subject
tries to "figure out" the relevant probability distributions. We are interested in the
performance reached after learning. This, of course, accounts for our insistence on
"naturalistic" tasks, for which the necessary learning has already been accomplished
through eons of evolution.)
The probability distribution of A if 00 is chosen is defined by P(A 14)o; 4)o vs. 01) =
f DO 50(0; (ko, 01) — A)P(0100). Similarly, P(Al(h;Oo vs. 40 can be defined. The
discrirninability, conventionally called (d')2, is defined as
where the subscript i = 0,1 indicates that the corresponding expectation value
should be taken in the distribution P(A100; eko vs. (h) and P(A101;0o vs. 01 ) re-
spectively. The meaning of (d')2 is obvious: it measures the overlap between the two
probability distributions when the two distributions are bell shaped. As the name
334 A. Zee
suggests, the discriminability (d')2 limits the extent to which one can discrimi-
nate between 00 and 01 . We are generally interested in the regime .(d')2 N 0 when
the visual discrimination task is highly "confusing." (When the distributions are
bell shaped, the discriminant can obviously be related to the percentage of correct
guesses. Incidentally, the discriminant (d')2, rather than some other more-or-less
equivalent quantity, is used because, being formed of "naturally occurring" expec-
tation values, it can be readily computed for certain simple problems and because
experimentalists in this field typically quote their observations in terms of (d')2.
The discriminant provides a convenient summary of the information contained in
the two P(A)'s.
A quite different theory would suggest that we look for and identify "features"
such as edges between predominantly black areas and predominantly white areas
in the patterns 00 and 01.
Suppose experiments show the actual performance to be substantially below
optimal performance. What would that mean? It might mean that the visual sys-
tem is capable of only a crude approximation in evaluating the functional integral
involved. It would then be interesting to determine what approximation the visual
system uses. This is certainly possible in principle. Alternatively, it might mean that
the visual system can evaluate the relevant functional integral fairly accurately, but
that the W used by the experimenter does not correspond to the W that we "carry
in our heads."
In principle, the experimenter can carry out a series of experiments, each with a
different W, all corresponding to "reasonable" choices. Suppose the optimal perfor-
mance can be determined for each W. It could happen that the actual performance
does not come close to the optimal performance for any of these W's. Perhaps
more interestingly, it could also happen that the actual performance reaches or
comes close to the optimal performance for some W's.
The correspondence with statistical mechanics also suggests the question of
whether some sort of universality might play an essential role in visual perception.
We can also ask whether the corresponding statistical-mechanical system exhibits
short-ranged or long-ranged correlation. In this connection, we may perhaps em-
phasize that two logically distinct issues surface in our program. First, we have the
question of whether the visual system can attain the optimal performance theoret-
ically attainable. Next, given that this optimal performance is in fact attained, we
can ask what are the computations necessary to attain this performance.
We would like to conjecture that actual performance does in fact come close to
optimal performance for some reasonable W. If experiments verify our conjecture,
then we are confronted by the interesting issue of the type of circuitry and algorithm
capable of effectively evaluating the functional integral involved.
strongly non-local functionals of image intensity, no such abrupt drop will be ob-
served. These experiments will be difficult, but they have the potential of providing
serious challenges to our understanding of computation in the nervous system.
Our suspicion is that the system can solve non-local problems, and that there
are interesting theoretical questions to be answered about the algorithms and hard-
ware responsible for such contributions. Suspicions aside, the approach described
here4,7 provides the tools for asking very definite questions about the computational
abilities of the brain.
Unfortunately, it is well-nigh impossible to evaluate functional integrals exactly.
After all, the exact evaluation of a functional integral amounts to the exact solution
of a statistical-mechanical system or of a quantum field theory. The history of
statistical mechanics and quantum field theory testifies amply to the difficulty of
the task. Thus, in our work we are reduced to trying various approximations to
the functional integrals, often reaching only qualitative conclusions. (Of course, the
functional integrals can also be evaluated numerically.)
Instead of trying to evaluate P(440), we have also tried to extract some general
features. In particular, we have considered using renormalization group to study the
properties of P(440 ) in an attempt to discover a strategy for "universal computa-
tion" in processing visual information.8
It is the interplay between noise and distortion that makes the evaluation of the
field theory defined by P(0100) so difficult. If either noise or distortion is omitted,
the task of evaluating P(014) becomes considerably simpler. (In particular, with
no distortion, the problem becomes Gaussian and trivial.) Why do we make life
miserable for ourselves? Because we want to appreciate the difficulty of a task that
the visual system performs extremely well (at least according to the subjective
evidence). Indeed, our work represents largely a record of our awakening to how
difficult the computations involved are. The difficulty of this task is also reflected in
the fact that machines with artificial vision have not mastered this task of "invariant
perception." Indeed, as far as we know, current machines have difficulty recognizing
images if the image can be arbitrarily rigidly translated, rotated, and dilated. Of
course, it may also turn out that our visual system does not perform as well as we
think it does.
MODELS
FEATURE DETECTORS
In the second part of our work, we attempt to capture, in quantitative models,
the essence of some leading theories of perception. We then compute the predicted
performance at a "naturalistic" perceptual task, with the aim of ultimately compar-
ing whatever results we may obtain with actual experiments on the human visual
system.
Information Processing in Visual Perception 337
For example, consider the feature-detector theory which originated in the neuro-
physiological experiments of the 1950's.f21 Neurons in the visual system are assumed
to compute nonlinear functionals of the image intensity and thus signal the presence
of features in the image. Thus, the continuous pattern 0(x) is converted into a set
of discrete "feature tokens" to be processed by subsequent layers of neurons. We
attempt to capture the essence of this theory by taking the simplest possibility for
the feature tokens: they are Ising spins at, located at xt,,it. = 1, 2, ... , N, with cr0
taking on values ±1. The image is sampled at x to give 4 = f d2xf(x — x0 )0(x)
where f (x x0 ) represents the response function of a feature-detector neuron lo-
cated at xp. The response function f(x), with its excitatory center and inhibitory
surround, is well known to neurophysiologists.13 It is often modeled as the Laplacian
of a Gaussian V2G or as the difference of two Gaussians.M
Our models is that o tends to be +1 when 00 is positive and —1 when 00 is
negative, as described by some probability distribution P(I710). Putting it together,
we have the conditional probability P(crleko) = f D¢ P(crIO)P(0100 ). In other words,
the experimenters (or the natural environment we live in) turns the known image
00 into 0. The seen image 0 is then processed into the "feature tokens" o0.
We believe that this "Ising" model is prototypical of a large family of models
which replace the continuous image 0(x) by discrete and local feature tokens. It
contains one of the classic feature-detector ideas concerning the extraction of edges,
a concept formalized by Marr, Poggio, and others as the location of contours where
some appropriately filtered version of the image vanishes. Here the "domain walls"
between spin-up and -down regions mark the zero-crossing contours, so in fact this
spin representation has a bit more information than a "sketch" based on zero-
crossing contours alone.
The point is that these models are sufficiently well defined so that they can be
studied numerically or analytically in certain limits. For instance, if the range of the
response function f (x) is small compared to the intercellular spacing (which is not
biologically reasonable), the maximum efficiency can be seen to be 2/hr = 0.64 which
appears low compared to experimental reports of efficiency ranging from 0.5 to 0.95.
We conclude that overlaps of the receptive fields of neighboring cells are essential
for understanding the observed efficiency of visual perception. Furthermore, these
overlaps must be negative to enhance (d')2, which necessitates an excitatory center,
inhibitory-surround type of organization found for real neurons. Obviously, we can
go on to consider variations of this model. For instance, the work of Hiibel and
Wiesel established that certain cells are selectively sensitive to directions.13 Thus,
instead of Ising spins, we can consider "Heisenberg" spins r0 .
(21For a brief review of the history of feature detectors, see Barrow 2'3
DIFor example, see Kuffler22 and Parker and Hawkenls; also Albrecht,' p.117 &
338 A. Zee
LINEAR FILTERS
Another class of models that we have considered supposes that the detectors in
the visual system act as linear filters.9 In other words, each detector functions as
a narrow-band Fourier analyzer centered at some characteristic spatial frequency.
Models of this type are suggested by the work of Campbell, Robson, Lawden, and
DeValois.N
-09 7
1 dp e f 4,12-[0(z)- 0.(za
P(0100) = —
Z'
where p parametrizes a family of distortions. For example, p can be an angle 9 and
xi, is equal to x rotated through O. We also take th = 0 for simplicity. Thus, the
subject is to decide on the presence or absence of the prototype pattern 150 with
the pattern obscured by noise and presented with a randomly chosen orientation on
each trial. We are interested in the regime in which the noise becomes overwhelming.
(By simple scaling, we see that quantities such as (d')2 depend only on 1317, so the
13 can be absorbed.) If f d2x Og(xp ) = f d2x (g(x) as is the case for the examples we
have considered, the relevant functional integral can be organized in the suggestive
form
1
P(014) = Z — dP e —PH0(0)-7111(4),p)
[(Hi - (Hno(T/i)or
(cigteepest descent [(Hl )o — (ling]
11(p,p') = (Hi (0, p)14 (0, TO) = 213 d2x 00(zi,)0.0 (xp, )
To see what is actually going on, we can now try out various specific prototype
pictures 00(x). For example, we have considered a wedge- or leaf-shaped picture
00 (z) = f(r)e-11/2c(r)le2 where r and 0 are the polar coordinates of x. The impor-
tant quantity here is the width of the wedge Ci(r) (which we take to be small).
We find
c-
it 1+13 (1 T)T-72
2 C- )
where (• • ) denotes some average of (• • ) over the radial direction weighted by
,f 2(r) and geometrical factors. If C(r) =constant (so that the picture is wedge
shaped), we have e = Pi/713/4(1 - [48])]. If the picture is very jagged so that
(C-2) >> (C-1)2, then e = [(C1)12-1].
How can we use this analysis to find out if the visual system is actually an
efficient device that locates minima or "best matches" (as simple neural network
models would suggest)? Suppose the experiment outlined is done and the measured
efficiency comes out to be equal to the value predicted by steepest descent. That
would offer dramatic support for the idea of "best match." On the other hand, if
the efficiency is measured to be greater than the predicted efficiency, that would
rule out or at least cast grave doubt on the "best match" theory. Unfortunately, the
situation is complicated by the possibility that information is lost by processing,
340 A. Zee
for instance, by feature detectors. Thus, one would have to consider the steepest
descent approximation in evaluating the integral over p not in P(0100), but in
P(cr1150) = f D¢ P(a10)P(0100 ) (with u denoting some feature "tokens"). For the
calculation outlined here to be relevant, we have to suppose that processing affects
both (d'l,steepest descent and (d')2 in the same proportion so that the effect cancels
out in c. Note, however, that the experiment can be repeated and the theoreti-
cal expression for c can be evaluated (numerically at least) for a wide variety of
prototype pictures 00 .
SUMMARY
In summary, we have outlined a systematic programMto address some quantitative
issues that we must resolve in order to understand visual perception. An important
point is that images used in vision experiments should be generated from statistical
ensembles that may be formulated analytically. Advances in high-speed compu-
tation should make possible this type of controlled experiments and at the same
time facilitate the analysis of models of how the human perceptual system tries to
determine these statistical ensembles.
SOME QUESTIONS
In conclusion, I would like to pose the following list of questions as a challenge to
vision researchers.
1. What is the optimal performance allowable for various perceptual tasks?
2. What are the computations needed to reach this optimal performance? Can we
identify the issues involved (for instance, local versus non-local computation)?
3. Is the visual system actually capable of this optimal performance? How close
does it come? (These questions can be answered only be experiments, of course.)
4. the performance of the visual system approximates optimal performance, how
If
does it perform the computations identified in question 2 above? What neural
circuitry and algorithm can carry out these computations? Can various simple
models be ruled out?
5. Does the visual system operate by optimization? Is the performance reached
by the steepest descent approximation in accordance with observation?
6. Are there universal features and properties in the sense of statistical physics?
As is evident by the preceding discussion, we have touched only on the begin-
nings of this program and have reached only qualitative conclusions at best. Many
challenging problems remain.
[51A. somewhat fuller account of the discussion presented here may be found in Zee.16
Information Processing in Visual Perception 341
ACKNOWLEDGMENTS
I am indebted to W. Bialek for numerous stimulating and interesting discussions.
This research was supported in part by the National Science Foundation under
Grant No. PHY82-17853, supplemented by funds from the National Aeronautics
and Space Administration, at the University of California at Santa Barbara.
REFERENCES
1. Albrecht, D. G., ed. Recognition of Pattern and Form. New York: Springer-
Verlag, 1982.
2. Barrow, H. B. "The Past, Present, and Future of Feature Detectors." In
Albrecht,' 4.
3. Barrow, H. B. "The Absolute Efficiency of Perceptual Decision." Phil. Trans.
Roc. Soc. London B290 (1980):71.
4. Bialek, W., and A. Zee. Phys. Rev. Lett., 58 (1987):741.
5. Bialek, W., and A. Zee. "Understanding the Efficiency of Human Percep-
tion." Phys. Rev. Lett., 61 (1988):1512.
6. Bialek, W., and A. Zee. "Inadequacy of Mean Field Approximation in Visual
Perception." In preparation.
7. Bialek, W., and A. Zee. "Invariant Perception: A Functional Integral and
Field Theoretic Approach." In preparation.
8. Bialek, W., and A. Zee. "Recognizing Ensembles of Images: Universality at
Low Resolution." In preparation.
9. Bialek, W., and A. Zee. "Linear Filter Models in Visual Perception." In
preparation.
10. Campbell, F. W., and M. Lawden. "The Physics of Visual Perception." In
Albrecht,' 146.
11. DeValois, R. L. "Early Visual Processing: Feature Detection or Spatial Filter-
ing." In Albrecht."
12. Kuffier, S. W. "Discharge Patterns and Functional Organization of Mam-
malian Retina." J. Neurophysio. 16 (1953):57.
13. Levine, M. W., and J. M. Shefner. Fundamentals of Sensation and Percep-
tion. New York: Random House, 1981.
14. Marr, D. Vision. New York: W.H. Freeman & Co., 1982.
15. Parker, A., and M. Hawken. "Capabilities of Monkey Cortical Cells in Spatial
Resolution Tasks." J. Opt. Soc. Am. 2 (1985):1101.
16. Zee, A. "Some Quantitative Issues in the Theory of Perception." In Evolu-
tion, Learning, and Cognition, edited by Y.C. Lee. Singapore: World Scien-
tific, 1989.
V Probability, Entropy, and
Quantum
Asher Peres
Department of Physics, Technion—Israel Institute of Technology, 32 000 Haifa, Israel
INTRODUCTION
Thermodynamics, relativity and quantum theory are the three pillars upon which
the entire structure of theoretical physics is built. They are not branches of physics
(like acoustics, optics, etc.) but general frameworks encompassing every aspect of
physics. Thermodynamics—for which a more appropriate name would have been
"thermostatics"—governs the convertibility of various forms of energy; relativity
theory deals with measurements of space and time; and quantum theory is a set of
rules for computing probabilities of outcomes of tests (also called "measurements")
following specified preparations.14
ENTROPY
The purpose of this section is to prove that von Neumann's definition of entropy is
equivalent to that of standard thermodynamics. I hope that this proof will be more
readable, and also more convincing, than the one found in von Neumann's classic
book." I shall use for this proof some recent results due to Partovi.9
Thermodynamic Constraints on Quantum Axioms 347
S = —N E ci log , (1)
where N is the total number of molecules, and c3 is the concentration of the jth
species. (Units are chosen so that Boltzmann's constant equals 1. Temperature is
therefore measured in energy units and entropy is dimensionless.) The derivation
of Eq. (1) relies on the possibility of making semipermeable membranes which are
transparent to type j molecules and opaque to all others. These membranes are used
as pistons in an ideal frictionless engine, immersed in an isothermal bath at temper-
ature T, as sketched in Figure 1. It is easily shown21 that a reversible separation of
the mixed gases must supply an amount of isothermal work —NT E c3 log ci. This
work is converted into heat and released into the reservoir. Therefore the mixing
entropy is given by Eq. (1).
von Neumann's definition of entropy of a quantum state closely parallels the
above argument. It assumes that there are semipermeable membranes capable of
separating orthogonal states with 100% efficiency—this is indeed the operational
meaning of "orthogonal states." The fundamental problem is whether it is legitimate
to treat quantum states in the same way as classical ideal gases, and in particular
why one should expect thermal equilibrium to be achieved.
In his proof, von Neumann15 relies on a subterfuge proposed by Einstein4 in
1914, in the early days of the "old" quantum theory. Consider many similarly pre-
pared quantum systems, such as Bohr's planetary atoms. Each one is enclosed in a
large box with impenetrable walls, so as to prevent any interaction between these
quantum systems. All these boxes are then placed into an even larger container,
where they behave as an ideal gas, because each box is so massive that classical
mechanics is valid for its motion (i.e., there is no need of Bohr-Sommerfeld quan-
tization rules—remember that we are in 1914). The container itself has ideal walls
which may be, according to our needs, perfectly conducting, perfectly insulating,
or with properties equivalent to those of semipermeable membranes. These "mem-
branes" are endowed with automatic devices able to peek inside the boxes and to
test the state of the quantum systems enclosed therein. In his book,15 von Neumann
insists (p. 359) that the practical infeasibility of this contraption does not impair
its demonstrative power: "In the sense of phenomenological thermodynamics, each
conceivable process constitutes valid evidence, provided that it does not conflict
with the two fundamental laws of thermodynamics." He then shows that Eq. (1)
can be recast in the form
S = —N Tr (p log p), (2)
where p is the density matrix representing the state of a molecule of our gas. The
ci of Eq. (1) correspond to the eigenvalues of p.
The problem remains whether this hybrid classical-quantal reasoning is con-
sistent. My purpose here is to give a genuinely quantal proof of the equivalence of
348 Asher Peres
INITIAL STATE
::::::::
::::::.:1::
THERMAL RESERVOIR FINAL STATE
FIGURE 1 Ideal engine used to separate gases A (to the left) and B (to the right). The
vertically and horizontally hatched semipermeable pistons are transparent to gases A
and B, respectively. The mechanical work that must be supplied in order to transform
the initial state into the final state is released as heat into the thermal bath.
von Neumann's entropy, Eq. (2), with the ordinary entropy of classical thermody-
namics.
following recipe: Let a random process have probability A to "succeed" and proba-
bility (1 — A) to "fail." In case of success, prepare the quantum system according
to pl. In case of failure, prepare it according to p2 . This process results in a p given
by
p = A pi + (1— A) p2 . (3)
Indeed, if the above instructions are executed a large number of times, the average
value obtained for subsequent measurements of A is
What I find truly amazing in this result is that, once p is given, it contains
all the available information and it is impossible to reconstruct from it pi, and p2!
For example, if we prepare a large number of polarized photons and if we toss a
coin to decide, with equal probabilities, whether the next photon to be prepared
will have vertical or horizontal linear polarization, or, in a different experimental
setup, we likewise randomly decide whether each photon will have right-handed or
left-handed circular polarization, we get in both cases the same
1 (1 0 \
P= i 0 1
An observer receiving megajoules of these photons will never be able to discover
which one of these two methods was chosen for their preparation, notwithstanding
the fact that these preparations are macroscopically different. (If this were not
true, EPR correlations would allow instantaneous transfer of information to distant
observers, in violation of relativistic causality.6)
Another example would be to prepare photons having, with equal probabili-
ties, linear vertical polarization or circular right-handed polarization. An observer
requested to guess what was the preparation of a particular photon, under the best
conditions allowed by quantum theory, would be able to give the answer with cer-
tainty in only 29.3% of cases.7,11 It will be shown below that a "superobserver"
who could always give an unambiguous answer would also be able to extract an
infinite amount of work from an isothermal reservoir.
Pl = Z
(6)
cannot decrease as a result of the collision. This follows from conservation of energy
and convexity of entropy.16 Note that in Eq. (7), p and Ho refer to the quantum
system, but J3 refers to the thermal reservoir with which it collided.
We further assume that the thermal reservoir is so large that its state after the
collision can again be described by Eq. (6), with the same j3, and in particular that
it is justified to ignore its correlation to the state of the quantum system. We there-
fore expect that, after numerous collisions, S— (E) will reach the maximum value
allowed by selection rules. If all the states of the quantum system are accessible
(i.e., if there are no selection rules), the maximum value of S —13(E) corresponds
to a Gibbs state of the quantum system, at the same temperature 0-1 as the reser-
voir. Not every state, however, may be accessible. In particular, if the container
has passive walls interacting only with the R degrees of freedom but not with the
q variables, the internal state of the quantum system cannot be affected. In that
case, an ensemble of quantum systems described by Eq. (5) has the same statistical
properties as a classical ideal gas of free particles of mass M. In particular, it exerts
exactly the same pressure on the walls of the container. This is an immediate con-
sequence of the evolution equation of the Wigner distribution function" which, for
free particles, is identical to the Liouville equation in classical statistical mechanics.
Up to this point, nothing was been proved that has any consequence in the
real world. Only the fictitious degrees of freedom R were thermalized by multiple
collisions with the fictitious container. The situation becomes more interesting if
semipermeable partitions are introduced. As explained above, these partitions are
described by an interaction term involving q, R, and X (note that the classical
parameters X are prescribed functions of time and that their time dependence
must be extremely slow on time scales relevant to the quantum system; otherwise
the Born-Oppenheimer approximation would not be valid).
To describe a semipermeable partition, we have to add to the right-hand side
of Eq. (5) a term which, in the simplest case, has the form V(q, R, X). This term, if
suitably chosen, causes the formation of correlations between the variables q and R.
For example, we can concentrate particles with spin up in one part of the container
and those with spin down in the other part, by introducing an interaction
or heat transfer. We thereby obtain a mixture of the two polarization states. Its
density matrix is
The eigenvalues of p are 0.854 (corresponding to photons polarized at 22.5° from the
vertical) and 0.146 (for the opposite polarization). We now replace the "unusual"
membranes by ordinary ones, selecting these two orthogonal polarization states. The
next step is an isothermal compression, leading to state (d) where both chambers
have the same pressure and the same total volume as those in state (a). This
isothermal compression requires an expenditure of work
—nT (0.146 log 0.146 + 0.854 log 0.854) = 0.416 nT, (10)
which is released as heat into the reservoir. This is less than the amount nT log 2—
which was gained in the isothermal expansion from (a) to (b)—by the amount
0.277 nT. Finally, no work is involved in returning from (d) to (a) by suitable
rotations of polarization vectors (see von Neumann,15 p. 366). We have thereby
demonstrated the existence of a closed cycle whereby heat is extracted from an
isothermal reservoir and converted into work, in violation of the second law of
thermodynamics.
p=AP0+(1—A)P,0 , (11)
354 Asher Peres
V
V
'?
V V V
V
FIGURE 2 Cyclic process extracting heat from isothermal reservoir and converting it
into work, by using a semipermeable partition which selects non-orthogonal photon
states. Double arrows represent the linear polarizations of photon ensembles. The
symbol V is for vacuum.
where 0 < A < 1 and where Po and Po are projection operators on the pure states
0 and 0, respectively. The nonvanishing eigenvalues of p are
tvJ• = 12 ±[ 1.4 — A (1 — A) (1 — x)]1, (12)
where x = 101012 . The entropy of this mixture, S = —k E wi log tvi , satisfies
dSldx < 0 for any A. Therefore, if the pure quantum states evolve as 0(0) —0 0(t)
and 0(0) 7b(t), the entropy of the mixture p shall not decrease (i.e., that mixture
shall not become less homogeneous) provided that
10(t)10(t))12 s 1(0(0)10(0))12 • (13)
In particular, if (0(0)10(0)) = 0, we must have (0(t)10)) = 0. Orthogonal states
must remain orthogonal.
Consider now a complete orthogonal set OE . We have, for every 0,
E1(410)12E1. (14)
ACKNOWLEDGMENT
This work was supported by the Gerard Swope Fund and by the Fund for Encour-
agement of Research at Technion.
356 Asher Peres
REFERENCES
1. Chirikov, B. V. "Transient Chaos in Quantum and Classical Mechanics."
Found. Phys. 16 (1986):39-49.
2. Datta, A., and D. Home. "Quantum Non-Separability Versus Local Realism:
A New Test using the B°B° System." Phys. Lett. A 119 (1986):3-6.
3. de Broglie, L. Une Tentative d'Interpritation Causale et Nonlineaire de la
Micanique Ondulatoire. Paris: Gauthier-Villars, 1956.
4. Einstein, A. "Beitrage zur Quantentheorie." Verh. Deut. Phys. Gesell. 16
(1914):820-828.
5. Einstein, A. "Quantentheorie der Strahlung." Phys. Z. 18 (1917):121-128.
6. Herbert, N. "FLASH-A Superluminal Communicator Based Upon a New
Kind of Quantum Measurement ." Found. Phys. 12 (1982):1171-1179.
7. Ivanovic, I. D. "How to Differentiate Between Non-Orthogonal States." Phys.
Lett. A 123 (1987):257-259.
8. Lande, A. Foundations of Quantum Theory. New Haven: Yale Univ. Press,
1955, 10-13.
9. Partovi, M. H. "Quantum Thermodynamics." Phys. Lett. A. 137 (1989):440-
444; see also contribution to the present volume.
10. Peres, A. "Relativity, Quantum Theory, and Statistical Mechanics are Com-
patible." Phys. Rev. D 23 (1981):1458-1459.
11. Peres, A. "How to Differentiate Between Non-Orthogonal States." Phys. Lett.
A 128 (1988):19.
12. Rosen, N., "On Waves and Particles." J. Elisha Mitchell Sci. Soc. 61 (1945):
67-73.
13. Shannon, C. "A Mathematical Theory of Communication." Bell Syst. Tech.
J. 27 (1948):379-423, 623-655.
14. Stapp, H. P. "The Copenhagen Interpretation." Am. J. Phys. 40 (1972):1098-
1116.
15. von Neumann, J. Mathematical Foundations of Quantum Mechanics. Prince-
ton, NJ: Princeton Univ. Press, 1955, 358-379.
16. Wehrl, A. "General Properties of Entropy." Rev. Mod. Phys. 50 (1978):221-
260.
17. Weinberg, S. "Particle States as Realizations (Linear and Nonlinear) of Space-
time Symmetries." Nucl. Phys. B (Proc. SuppL) 6 (1989):67-75.
18. Wigner, E. P. "On the Quantum Correction for Thermodynamic Equilib-
rium." Phys. Rev. 40 (1932):749-759.
19. Wigner, E. P. Group Theory. New York: Academic Press, 1959, 233-236.
20. Wootters, W. K., and W. H. Zurek. "A Single Quantum Cannot Be Cloned."
Nature 299 (1982):802-803.
21. Zemansky, M. W. Heat and Thermodynamics. New York: McGraw-Hill, 1968,
561-562.
M. Hossein Partovi
Department of Physics, California State University, Sacramento, California, 95819
Entropy is a natural and powerful idea for dealing with fundamental prob-
lems of quantum mechanics. Recent results on irreversibility and quantum
thermodynamics, reduction and entropy increase in measurements, and the
unification of uncertainty and entropy demonstrate the fact that entropy is
the key to resolving some of the long-standing problems at the foundations
of quantum theory and statistical mechanics.
INTRODUCTION
A distinctive feature of quantum theory is the highly nontrivial manner in which
information about the quantum system is inferred from measurements. This feature
is obscured in most discussions by the assumption of idealized measuring devices
and pure quantum states. While for most practical purposes these are useful and
reasonable approximations, it is important in dealing with fundamental issues to
recognize their approximate nature. This recognition follows from the simple ob-
servation that in general measuring devices can not fully resolve the spectrum of
the physical observable being measured, a fact that is self-evident in the case of
observables with continuous spectra. A consequence of this remark .is that in gen-
eral realizable quantum states can not be pure and must be represented as mixed
states.' Equivalently, a quantum measurement in general is incomplete in that it
fails to provide an exhaustive determination of the state of the system. The problem
of incomplete information, already familiar from statistical mechanics, communica-
tion theory and other areas, is thus seen to lie at the very heart of quantum theory.
In this sense, quantum mechanics is a statistical theory at a very basic level, and
there should be little doubt that entropy, a key idea in dealing with incomplete in-
formation, should turn out to play a central role in quantum mechanics as well. The
main purpose of the following account is to demonstrate this assertion by means of
examples drawn from recent work on the subject.
ENTROPY
To define entropy at the quantum level, we shall start with the notion of entropy as-
sociated with the measurement of an observable, the so-called measurement entropy.
We shall then show that ensemble entropy, given by the well-known von Neumann
formula, .follows from our definition of measurement entropy by a straightforward
reasoning. Later, we will establish the identity of this ensemble entropy with ther-
modynamic entropy, so that there will be no distinction between "information" and
"physical" entropies (other than the fact that, strictly speaking, the latter is only
defined for equilibrium states).
In general, a measurement/preparation process involves a measuring device
D designed to measure some physical observable A. Let fl be the density matrix
representing the state of the system and A the operator representing observable A.
Thus the quantum system is a member of an ensemble of similarly produced copies,
some of which are subjected to interaction with the measuring device and serve to
determine the state of the ensemble. The measuring device, on the other hand,
involves a partitioning of the range of possible values of A into a (finite) number
of bins, {a}, and for each copy of the system measured, determines in which bin
the system turned up. In this way, a set of probabilities {Pill} is determined. What
becomes of the state of the copies that actually interact with the measuring device
is an important question (to be discussed later), but one that is distinct from
the issue of the state of the ensemble. Indeed many measurement processes are
partially or totally destructive of the measured copies of the system. The purpose
of the measurement/preparation process is thus to gain information about the state
of the ensemble by probing some of its members, often altering or even destroying
the latter in the process.
The fact that A represents a physical observable insures that any partition of
its spectrum generates a similar (orthogonal) partition of the Hilbert space, given
by a complete collection of projection operators {*A} in one-to-one correspondence
Entropy and Quantum Mechanics 359
s(p) = - ln Pi A , (1)
One can show that the right-hand side of Eq. (2) is realized for A= p. The cor-
responding minimum is then found to be the von Neumann expression for ensemble
entropy, —trti In 0. Starting from the elementary definition (1) for measurement en-
tropy, we have thus arrived at the standard expression for ensemble entropy. We
shall show later that S(p) coincides with the thermodynamic entropy, assuring us
that information entropy and physical entropy are the same. For these reasons, we
shall refer to the ensemble entropy, S(A), as the Boltzmann-Gibbs-Shannon (BGS)
entropy also.
360 M. Hossein Partovi
QUANTUM THERMODYNAMICS
Are the laws of thermodynamics—equivalently, any of the postulates commonly
adopted as the basis of statistical mechanics—independent laws of nature, or do
they in fact follow from the underlying dynamics? Ever since Boltzmann's brilliant
attempt at deriving thermodynamics from dynamics by means of his H-theorem,
there have been countless attempts at resolving this issue." We believe the question
has now been settled at the quantum leve1,7 and it is our purpose here to define
thermodynamics for quantum systems and describe how the zeroth and second
laws actually follow from quantum dynamics without any further postulates (the
first and third laws are direct consequences of dynamical laws and need not be
considered).
Entropy and Quantum Mechanics 361
Here AS. and AU. are the change in the entropy and energy of system a, and f3b
is the parameter characterizing the initial Gibbs state of system b. It is important
to realize that the inequality in Eq. (3) is a nonequilibriurn result, since, except
362 M. Hossein Partovi
for the initial state of system b, all other states (including the final state of b)
will in general be nonequilibrium states. Furthermore, there is no implication in
Eq. (3) that the changes in entropy or energy of either system are in any way small.
Finally, appearances notwithstanding, the left-hand side of Eq. (3) is not related
to a change in the Helmholtz free energy of system a (a quantity which is only
defined for equilibrium states; besides, fib is a property of the initial state of b and
has nothing to do with system a).
The zeroth law can now be obtained from Eq. (3) by considering both systems
a and b to be initially in Gibbs states. Then one has AS0 — fiaDUa < 0 as well
as ASb — f3bAUb < 0. These combine to give /3.2AUa + f3b0Ub > AS0 + A.Sb > 0.
Since AU. + AUb = 0 (conservation of energy), one has (f3,2 — /3b)AUa > 0. This
inequality implies that the flow of energy is away from the system with the smaller
value of the parameter. With identified as inverse temperature, and the property
established earlier that Gibbs states with the same value of do not change upon
interaction, we have arrived at the zeroth law of thermodynamics (note that in our
units Boltzmann's constant equals unity).
To derive the second law, consider a cyclic change of state for system a brought
about by interaction with a number of systems bi which are initially in equilibrium
at inverse temperatures Each interaction obeys inequality (3), so that ASci —
> 0 for the ith interaction. Since in a cyclic change AS = AU = 0, it
follows that Ei OSai = 0. Summing the inequality stated above on the index i, one
arrives at
EthAUci 0. (4)
This inequality is a precise statement of the Clausius principle. Note that in con-
ventional terms AUgi would be the heat absorbed from system bi, as explained
earlier. Note also that system a need not be in equilibrium at any time during the
cycle, and that the f3i only refer to the initial states of the systems
The Clausius principle established above is equivalent to the second law of
thermodynamics, and the entropy function defined from it is none other than the
one we have been using, namely the BGS entropy.
Further results on approach to equilibrium, the unique role of the canonical
ensemble in quantum thermodynamics, and the calculation of the rate of approach
to equilibrium in a specific example can be found in Partovi?
Here ti represents that state of the device which corresponds to the value of A
turning up in the bin ai. By contrast, Fq represents a state of the device which
corresponds to the state of the system being of the non-diagonal form Now
in a proper measurement, such non-diagonal contributions are never observed, i.e.,
1/' is absent, and all one sees of SI(T) is the reduced part S2R. This disappearance
of the off-diagonal contribution St' constitutes the crux of the measurement prob-
lem. We will now describe how interaction with the environment in fact serves to
eliminate ft' and leave nR as the final state of the system-device complex.
364 M. Hossein Partovi
To establish the result just stated, first we need a theorem on the decay of
correlations. Let the correlation entropy, CAB, between two systems A and B be
defined as the difference SA + SB - SAB. Note that CAB is non-negative, vanishing
only when the two systems are uncorrelated, i.e., when pAB = 13 AAB. Now consider
four systems A, B, C and D, initially in the state PABcD(0) = ijAB (0)0c (0))5B (0).
The notation implies that systems A and B are initially correlated while all other
pairs are initially uncorrelated. Starting at t = 0, system A interacts with system
C while system B interacts with system D. Then, using a property of the BGS
entropy known as strong subadditivity,4 one can shows that CAB (t) < CAB(0).
In other words, interactions with other systems will in time serve to decrease the
correlations initially present between A and B. This intuitively "obvious" result is
actually a highly nontrivial theorem that depends on the subadditivity property of
entropy, itself a profound property of the BGS entropy.
A measuring device, or more accurately the part of it that directly interacts
with the quantum system, has a very large cross section for interaction with the rest
of the universe, or its environment. Therefore, although the system-device interac-
tion ceases after the establishment of correlations in f2(T), the device continues to
interact with the environment. According to the result established above, on the
other hand, this causes the system-device correlations to decay, so that the final
value of the system-device correlation entropy will be the minimum consistent with
the prevailing conditions.
A closer examinations of the structure of S'i(T) in Eq. (6), together with the
conditions that the measuring device must obey, reveals that the minimum system-
device correlation entropy is reached when ft' = 0, i.e., when 52(T) is in fact reduced
to SIR, thus establishing the fact that it is the interaction with the environment
which brings about the reduction of the state of the system. It is now clear why
reduction appears to be totally inexplicable when viewed in the context of system-
device interactions only.
The reduction process described above entails an entropy increase, given by
AS = S(OR) — S(0). A straightforward calculation of this entropy increase givess
with the obvious interpretation that the entropy increase comes about as a result
of reducing the initial state of the system ji to the final state Ei fritiwrti, with the
off-diagonal elements removed; cf. Eq. (5).
As an application of Eq. (7), we will consider the measurement (in one dimen-
sion) of the momentum of a system initially in a pure (hence idealized) Gaussian
state with a momentum spread equal to p. The measuring device will be assumed
to have uniform bins of size Ap (roughly equal to the resolution of the momentum
analyzer). Then one finds from Eq. (7) that AS = — In Pi, where
Pi = (7.p2)-1/2 dp exp pz
Entropy and Quantum Mechanics 365
Here the integral extends over the ith bin. Note that AS is precisely what we named
measurement entropy before.
Consider now the following limiting values of S. For a crude measurement,
Ap >> p, such that practically all events will turn up in one channel (or bin), say,
channel k. Then we have Pk = 1, P • 0 for i # k, and we find AS 'Z. 0, exactly
as expected. For a high resolution analyzer, on the other hand, Op << p, so that
st_ (722)-1/20P exp(pi 2 p2),
and we find
1
AS -.15 —(3 + In r) + In (—) , (/.1 >> Ap). (8)
2
Thus the entropy increase for reducing the state of the system grows indefinitely
as the resolution of the momentum analyzer is increased. Again this is exactly as
expected, and points to the impossibility of producing pure states by means of a
(necessarily) finite preparation procedure.
It should be pointed out at this point that Eq. (7) actually represents a lower
limit to the amount of entropy increase in a measurement, and that the actual value
can be far larger than this theoretical minimum.
CONCLUDING REMARKS
In the preceding sections we have described certain basic ideas about the role and
meaning of entropy in quantum mechanics, and have outlined a number of appli-
cations of these ideas to long-standing problems in quantum theory and statistical
mechanics. Among these are the quantum maximum uncertainty/entropy principle,
multitime measurements and time-energy uncertainty relations, the reversibility
problem of statistical mechanics, and the measurement problem of quantum the-
ory. On the basis of the results obtained so far (the details of which can be found in
the original papers cited above), it should be amply clear that entropy, properly de-
fined and applied, is a most powerful notion for dealing with problems of foundation
in quantum mechanics. As remarked earlier, this is because the manner in which
measurements yield information about a quantum system is unavoidably statisti-
cal in nature, thus entailing all the usual consequences of dealing with incomplete
information, including entropy.
In retrospect, it is rather remarkable how the dynamics of elementary, micro-
scopic systems of a few degrees of freedom can turn into a statistical problem of
considerable complexity when dealing with measured data.
366 M. Hossein Partovi
ACKNOWLEDGMENTS
This work was supported by the National Science Foundation under Grant No.
PHY-8513367 and by a grant from California State University, Sacramento.
REFERENCES
1. Blankenbecler, R., and H. Partovi. "Uncertainty, Entropy, and the Statistical
Mechanics of Microscopic Systems." Phys. Rev. Lett. 54 (1985):373-376.
2. Deutsch, D. "Uncertainty in Quantum Measurements." Phys. Rev. Lett. 50
(1983):631-633.
3. Jaynes, E. T. "Information Theory and Statistical Mechanics." Phys. Rev.
106 (1957):620-630.
4. Lieb, E., and M. B. Ruskai. "A Fundamental Property of Quantum-
Mechanical Entropy." Phys. Rev. Lett. 30 (1973):434-436.
5. Partovi, H. "Entropic Formulation of Uncertainty for Quantum Measure-
ments." Phys. Rev. Lett. 50 (1983):1882-1885.
6. Partovi, H., and R. Blankenbecler. "°Time in Quantum Measurements." Phys.
Rev. Lett. 57 (1986):2887-2890.
7. Partovi, H. "Quantum Thermodynamics." Phys. Lett. A 137 (1989):440-444.
8. Partovi, H. "Irreversibility, Reduction, and Entropy Increase in Quantum
Measurements." Phys. Lett. A 137 (1989):445-450.
9. Peres, A. "When is a Quantum Measurement." Am. J. Phys. 54 (1986):688-
692.
10. Shannon, C. "A Mathematical Theory of Communication." Bell Syst. Tech.
J. 27 (1948):379-423, 623-655.
11. Wehrl, A. "General Properties of Entropy." Rev. Mod. Phys. 50 (1978):221-
260.
12. Zeh, H. D. "On the Irreversibility of Time and Observation in Quantum The-
ory." In Foundations of Quantum Mechanics, edited by B. d'Espagnat. New
York: Academic Press, 1971.
13. Zurek, W. H. "Environment-Induced Superselection Rules." Phys. Rev. D 26
(1982):1862-1880.
0. E. Rossler
Institute for Physical and Theoretical Chemistry, University of Tubingen, 7400 Tubingen,
West Germany
The big mystery in the formalism of quantum mechanics is still the measure-
ment problem—the transition from the linear probability amplitude formalism to
the nonlinearly projected individual events.10 In contrast, the "relativistic measure-
ment problem," in which everything is compounded by relativistic considerations,
has so far even resisted all attempts at formalization.1,2,6,12,16
In the context of the ordinary measurement problem, the paradigm of corre-
lated photons3,a,11,19 has already proven an invaluable empirical tool. While the
two individual projection results remain probabilistic, they nevertheless are strictly
correlated across the pair—as if one and the same particle were available twice!
Therefore, the question arises of whether the same tool may not be transplanted
away from its original domain (that of confirming quantum mechanics) in order to
be used as a probe in the unknown terrain of relativistic quantum mechanics.
A similar proposal has been once made with disappointing results. Einstein8,5
had thought of subjecting two correlated particles to a condition in which both
are causally insulated (space-like separated) in order to, at leisure, collect from
each the result of a different projected property of the original joint wave function.
Since, in this way, two incompatible (noncommuting) measurement results could be
obtained from the same wave function, his declared aim was to "complete" quantum
mechanics in this fashion. To everyone's surprise, Bells was able to demonstrate that
the two particles remain connected "non-locally." They behave exactly as if the
distant measurement. performed on the first had been performed twice, namely on
the second particle, too, in the form of a preparatory measurement. More technically
speaking, the distant measurement throws both particles into the same eigenstate
(reduction at a distance). The achievement of Bell was to show that this implication
of the quantum-mechanical formalism is indeed incompatible with any pre-existing
set of properties of the two particles that would make the effect at a distance only
an apparent one. A painstaking analysis of all relative angles and their attendant
correlations was the key step. Thus, Einstein's intuition was proven wrong for once.
No more than one "virgin reduction" of the original wave function need be assumed.
However, the mistake made by Einstein may have been smaller than meets the
eye. his specific proposal to use relativistic insulation (space-like separation) as a
means to "fool" quantum mechanics was presumably chosen for didactic reasons
only. The larger idea—to use relativity theory for the same purpose—is still un-
consummated. There may exist a second mechanism at causal separation between
two space-like separated events that when applied to the two measurements might
indeed "decouple" them, so that quantum mechanics could be fooled indeed—or
else would have to respond with an even more vigorous and surprising defense.
Such a second mechanism, in fact, exists as is well known. The temporal order-
ing between two space-like separated events (their causal relationship, so to speak)
is not a relativistic invariant. The very "connection" discovered by Bell makes this
result, which ordinarily poses no threat to causality,'-' conducive to carrying an
unexpected power.
Let us illustrate the idea in concrete terms (Figure 1). The two measuring
stations used in the Aspect3'4 experiment here are assumed to have been put in
motion relative to each other. Moreover, the two distances are chosen so carefully
that exactly the above condition (reversal of priority relations as to which mea-
suring device is closer to the point of emission in its own frame) is fulfilled. In
consequence, each half experiment is identical with an ordinary Aspect experiment
in which the most important measurement (the first) has already taken place. Only
after this first reduction has been obtained in the frame in question will there be
a second measurement. This second measurement, of course, will be performed by
a moving (receding) measurement device. But, since by that time the joint reduction
Completion of Quantum Mechanics 369
X'
of the photon's state has already been accomplished, the other photon already
possesses a well-defined spin. Hence, by the well-known fact that a photon's spin
the same fixed yields outcome whatever the state of the head-on motion of the
measuring device,7 there is indeed no difference relative to an ordinary Aspect
experiment. A "catch-22" situation has therefore been achieved.
The question is open how nature will respond to this experiment. Let us, there-
fore, first check whether it can, in fact, be done (Figure 2). Here for simplicity
only two frames are assumed, one stationary and the other moving. Both are com-
pletely symmetric. If, as shown, detector delays of 3 nanosecond (d =1 light meter)
are assumed, and if, in addition, satellite velocity (v = 11 km/sec) is assumed for
the second frame, so that v/c = 4 x 10-5 K 1, one sees from the diagram that
s = d x c/v = 2.5 x 104m = 25km = l6mi. This amounts to a rather "long-
distance" version of the Aspect experiment. The weak intensity of the source used
in the latter3'4 would certainly forbid such an extension. Moreover, the two photons
are not simultaneously emitted in this experiment 3,4,11 Therefore, it is fortunate
that a more recent experiment exists which is both high-intensity and of the simul-
taneously emitting type since the two photons are generated in parametric down
conversion before being superposed.13 Therefore, the present experiment can be
actually implemented in two steps, by first scaling up the Ou-Mandel experiment,
370 0. E. Flossier
t",
ti
s z (e.g.)
►
eigen-
state
s eigerr
z state. •
I /
=d
x'
_— •r 00 ""
g.
lm
FIGURE 2 The experiment of Figure 1, redrawn in more detail in the two frames x'
and x". Note that the slope of the x" axis equals v/c, but also is equal to d/s (with
d measured in meters) as long as v is much smaller than c. With d and v fixed, s (the
minimum distance to the source from either measuring device in its own frame) can,
therefore, be calculated. Compare text.
and by then making one of the two measuring devices (analyzer plus detector)
spacebound.
This essentially concludes the message of the present note. What remains is
to make the connection to other work. While the present experiment is new, Shi-
mony17 recently looked at a rather similar case. His mathematical analysis (done
without pictures) fits in perfectly as a complement to the present context. The
only difference: He did not differentiate between the two measuring devices being
mutually at rest or not. He, therefore, could rely entirely on the Bell experiment,
with his added analysis only having the character of a gedanken experiment that
cannot and need not be done since all the facts are available anyhow. His conclu-
sion nevertheless was quite revolutionary since it culminated in the conjecture that
the quantum-mechanical notion of a measured eigenstate may have to be redefined
such that it becomes frame-dependent.
Shimony's conclusion had been reached before by Schlieder" and Aharonov
and Albert.2 These authors applied a well-known axiom from relativistic quantum
mechanics (that two space-like separated measurements always commuters) to cor-
related particles, arriving at the theorem that the same particle may possess multi-
ple quantum states (density matrices) at the same point in spacetime. Specifically,
these states form—in accordance with earlier proposals of Dirac, Tomonaga and
Completion of Quantum Mechanics 371
"save" the commutation relations (by. excluding joint reductions) and spell the
end of the doctrine of an observer-invariant spacetime (since the state of motion
of a measuring device could affect photon spin). However, it would be much too
"heavy" to be seriously proposed as a prediction. All quantitative theory available
would thereby be contradicted. There is nothing on the horizon that could seriously
threaten the invariance of quantum spacetime. What is new is only that the latter
has become empirically confirmable (and therefore also "in principle falsifiable" )
for the first time.
To conclude, a new quantum experiment feasible with current technology has
been proposed. The status of the commutation relations in the relativistic measure-
ment problem can be decided. Specifically, the "space-borne" Ou-Mandel experi-
ment will show whether, (1) the current idea that there exists an observer-invariant
quantum spacetime can be upheld (Einstein completion) or (2) the observer is re-
inforced in his aparticipatory"20 role in a whole new context.
ACKNOWLEDGMENTS
I thank Wojciech Zurek and Jens Meier for discussions and John Bell for a correction
concerning Figure 2.
Added in Proof: In 1984 Peres15 used a diagram similar to Figure 1, which was
redrawn for this paper.
Completion of Quantum Mechanics 373
REFERENCES
1. Aharonov, Y., and D. Z. Albert. Phys. Rev. D24 (1981):359.
2. Aharonov, Y., and D. Z. Albert. Phys. Rev. D29 (1984):228.
3. Aspect, A., D. Grangier, and G. Roger. Phys. Rev. Lett. 49 (1982):91.
4. Aspect, A., J. Dalibard, and G. Roger. Phys. Rev. Lett. 49 (1982):1804.
5. Bell, J. S. Physics 1 (1964):195.
6. Bloch, I. Phys. Rev. 156 (1967):1377.
7. Bjorken, J. D., and S. D. Drell. Relativistic Quantum Mechanics. New York:
McGraw Hill, 1964.
8. Einstein, A. In Institut International de Physique Solvay, Rapport et Discus-
sions du 5e Conseil. Paris, 1928, 253.
9. Einstein, A., B. Podolsky, and N. Rosen. Phys. Rev. 47 (1935):777.
10. Jammer, M. The Philosophy of Quantum Mechanics, the Interpretations of
Quantum Mechanics in Historical Perspective. New York: Wiley, 1974.
11. Kocher, C. A., and E. D. Commins. Phys. Rev. Lett. 18 (1967):575.
12. Landau, L. D., and R. Peierls. Z. Physik 69 (1931):56.
13. Ou, Z. Y., and L. Mandel. Phys. Rev. Lett. 61 (1988):50.
14. Park, D., and H. Margenau. In Perspectives in Quantum Theory, edited by
W. Yourgrau and A. Van der Merwe. Boston: MIT Press, 1971, 37.
15. Peres, A. Amer. J. Phys. 52 (1984):644.
16. Schlieder, S. Commun. Math. Phys. 7 (1968):305.
17. Shimony, A. In Quantum Concepts in Space and Time, edited by R. Penrose
and C. J. Isham. Oxford: Clarendon, 1986, 182, 193-195.
18. von Neumann, J. In Mathematical Foundations of Quantum Mechanics.
Princeton: Princeton University Press, 1955, 225-230.
19. Wheeler, J. A. Ann. N. Y. Acad. Sci. 48 (1946):219.
20. Wheeler, J. A. "Genesis and Observership." In Foundational Problems in Spe-
cial Sciences, edited by R. E. Butts and K. J. Hintikka. Dordrecht: Reidel,
1977.
21. Zeeman, E. C. J. Math. Phys. 5 (1964):5.
J. W. Barrett
Department of Physics, The University, Newcastle upon Tyne NE1 7RU, United Kingdom
QUANTUM MECHANICS
One fairly standard view of quantum mechanics is the following (see the contri-
butions by Omnes and Gell-Mann in this volume): After irreversible coupling to
the environment, the properties of an object are consistent with the idea that one
of the alternatives for its behavior has occurred in a definite way. The clause "in
a definite way" is important here. I also want to stress that I am saying that the
properties of an object become consistent with the idea that something definite has
occurred, but no further; I am not pinning down a time at which one alternative is
chosen.
I would like to consider a universe consisting of a small number of finite quantum
systems, which we think of as exchanging information. There is a difficulty, because
if one system gains information about a second one through an interaction, there is
the possibility that the interaction might be "undone" later on, by some interaction
which makes use of the quantum correlations between the two systems. Thus, the
information gained has only a definite character so long as we forgo the further
"use" of some of the possible quantum interactions. This definiteness which I am
talking about is, I think, captured by the idea of a consistent logic, introduced by
Omnes, and having its roots in Griffiths' idea3 of a consistent history for a quantum
system.
In the real world, there is always a large-scale, decohering environment involved
when we, human beings, gain information: our own bodies, if nothing else. In most
cases, though, it is an inanimate part of the environment. Then, it becomes ef-
fectively impossible to show up the coherence between the small system and the
macroscopic object, because the experimental manipulations involved get too com-
plicated. They would involve a huge amount of information. Incidentally, this shows
up some sort of link between information, of an algorithmic kind (how to program a
robot doing the experiments), and the measure of correlation information discussed
by Everett' which uses the density matrix.
Because the environment "decoheres" interactions in the real world, the present
discussion is optional from a strictly practical viewpoint. However, I am unhappy
with this resolution of the problem, and think that quantum mechanics ought to
make sense without the necessary inclusion of such very large systems.
this very literal way, one is bound to regain the idea of the set 0 for the two-
spin experiment, and not be able to reconstruct quantum mechanics. I think other
axioms of set theory have an interpretation in terms of physical operations based
on a pre-quantum understanding of physics.
Thus we have been accustomed to abandoning the goal of understanding quan-
tum objects entirely in terms of classical set-theoretic constructions, but speak
about them in roundabout ways. This is the source of the tension in debates about
quantum theory. Omnes has clarified exactly to what extent one can use set theo-
retic constructs in quantum theory in a direct way, and where the inconsistencies
set in. To my mind this is very important advance. However, I feel that there ought
to be a set-theoretic language which applies directly to all quantum interactions.
Perhaps it is along the lines Finkelstein has suggested.2
disguised because of the law of large numbers)? This is the type of question which
Penrose raised when he invented spin networks :1
Keeping the macroscopic objects a finite size has other effects. The angle is
effectively a property of the spatial relationship between the polarizer and analyzer.
A finite size for these means a cutoff in the spectrum of the angular momentum for
each object, and hence some uncertainty in the relative angle between the two due
to the quantum uncertainty principle. Thus the bit string that one gets in this case,
from writing 0 when a photon fails to pass through and 1 if it does pass through,
does not define an angle in the classical sense. What I mean is that there is not a
continuum range of values which the angle, as defined by the quantum "surveying,"
can be said with certainty to take.
Thus we see that the continuum nature of space-time, the continuum nature of
the space of quantum wavefunctions, and the usual assumption of the existence of
infinitely large and massive reference bodies, are inextricably linked. In particular,
we see that the quantum wavefunction is not just a property of the photon spin. It
is a property of the space-time measurement as well as of the photon itself.
The implications of this for understanding the concept of information, in an
algorithmic sense, in quantum theory are something one cannot ignore. Hone wants
to deal with a finite amount of information, one has to use systems of a finite size
throughout; then one cannot use continuum concepts such as a wavefunction in the
conventional sense. I feel that a satisfying resolution of this problem should also
be one that solves the puzzles I outlined earlier about the relationship of quantum
properties to classical sets.
380 J. W. Barrett
REFERENCES
1. DeWitt, B. S., and N. Graham. The Many-Worlds Interpretation of Quantum
Mechanics. Princeton: Princeton University Press, 1973.
2. Finkelstein, D. "Quantum Net Dynamics." Intl. J. Timor. Phys. (1989). To
appear.
3. Griffiths, R. B. "Correlations in Separated Quantum Systems: A Consistent
History Analysis of the EPR Problem." Am. J. Phys. 55 (1987):11-17.
4. Penrose, R. "Angular Momentum: An Approach to Combinatorial Space-
Time." In Quantum Theory and Beyond, edited by T. Bastin. London: Cam-
bridge University Press, 1971.
E. T. Jaynes
Wayman Crow Professor of Physics, Washington University, St. Louis, MO 63130
For some sixty years, it has appeared to many physicists that probability
plays a fundamentally different role in quantum theory than it does in
statistical mechanics and analysis of measurement errors. A common notion
is that probabilities calculated within a pure state have a different character
than the probabilities with which different pure states appear in a mixture
or density matrix. As Pauli put it, the former represents "...eine prinzipielle
Unbestimmtheit, nicht nur Unbekanntheit." But this viewpoint leads to so
many paradoxes and mysteries that we explore the consequences of the
unified view—all probability signifies only human information. We examine
in detail only one of the issues this raises: the reality of zero-point energy.
Hilbert space. But, fundamentally, every EM field is a source field from somewhere;
therefore, it is already an operator on the space of perhaps distant sources. So why
do we quantize it again, thereby introducing an infinite number of new degrees of
freedom for each of an infinite number of field modes?
One can hardly imagine a better way to generate infinities in physical predic-
tions than by having a mathematical formalism with (oo)2 more degrees of freedom
than are actually used by Nature. The issue is: should we quantize the matter and
fields separately, and then couple them together afterward, or should we write down
the full classical theory with both matter and field and with the field equations in
integrated form, and quantize it in a single step? The latter procedure (assuming
that we could carry it out consistently) would lead to a smaller Hilbert space.
The viewpoint we are suggesting is quite similar in spirit to the Wheeler-
Feynman electrodynamics, in which the EM field is not considered to be a "real"
physical entity in itself, but only a kind of information storage device. That is,
the present EM field is a "sufficient statistic" that summarizes all the information
about past motion of charges that is relevant for predicting their future motion.
It is not enough to reply that "The present QED procedure must be right
because it leads to several very accurate predictions: the Lamb shift, the anoma-
lous moment, etc." To sustain that argument, one would have to show that the
quantized free field actually plays an essential role in determining those accurate
numbers (1058 MHz, etc.). But their calculation appears to involve only the Feyn-
man propagators; mathematically, the propagator D(x — y) in Eq. 1 is equally well
a Green's function for the quantized or unquantized field.
The conjecture suggests itself, almost irresistibly, that those accurate experi-
mental confirmations of QED come from the local source fields, which are coherent
with the local state of matter. This has been confirmed in part by the "source-field
theory" that arose in quantum optics about 15 years ago. 1 '15'213 It was found that, at
least in lowest nonvanishing order, observable effects such as spontaneous emission
and the Lamb shift, can be regarded as arising from the source field which we had
studied already in classical EM theory, where we called it the "radiation reaction
field." Some equations illustrating this in a simpler context are given below.
In these quantum optics calculations, the quantized free field only tags along,
putting an infinite uncertainty into the initial conditions (that is, a finite uncer-
tainty into each of an infinite number of field modes) and thus giving us an infinite
"zero-point energy," but not producing any observable electrodynamic effects. One
wonders, then: Do we really need it?
1927 vintage quantum theory, where the conceptual problems of the "Copenhagen
interpretation" refuse to go away, but are brought up for renewed discussion by
every new generation (much to the puzzlement, we suspect, of the older generation
who thought these problems were all solved). Starting with the debates between
Bohr and Einstein over sixty years ago, different ways of looking at quantum theory
persist in making some see deep mysteries and contradictions in need of resolution,
while others insist that there is no difficulty.
Defenders of the Copenhagen interpretation have displayed a supreme self-
confidence in the correctness of their position, but this has not enabled them to
give the rest of us any rational explanations of why there is no difficulty. Richard
Feynman at least had the honesty to admit, "Nobody knows how it can be that
way."
We doubters have not shown so much self-confidence; nevertheless, all these
years, it has seemed obvious to me—for the same reasons that it did to Einstein
and Schr8dinger—that the Copenhagen interpretation is a mass of contradictions
and irrationality and that, while theoretical physics can of course continue to make
progress in the mathematical details and computational techniques, there is no hope
of any further progress in our basic understanding of Nature until this conceptual
mess is cleared up.
Let me stress our motivation: if quantum theory were not successful pragmat-
ically, we would have no interest in its interpretation. It is precisely because of
the enormous success of the QM mathematical formalism that it becomes crucially
important to learn what that mathematics means. To find a rational physical in-
terpretation of the QM formalism ought to be considered the top priority research
problem of theoretical physics; until this is accomplished, all other theoretical re-
sults can only be provisional and temporary.
This conviction has affected the whole course of my career. I had intended
originally to specialize in Quantum Electrodynamics, but this proved to be impos-
sible. Whenever I look at any quantum-mechanical calculation, the basic craziness
of what we are doing rises in my gorge and I have to try to find some different way
of looking at the problem that makes physical sense. Gradually, I came to see that
the foundations of probability theory and the role of human information have to
be brought in, and so I have spent many years trying to understand them in the
greatest generality.
The failure of quantum theorists to distinguish in calculations between several
quite different meanings of "probability," between expectation values and actual
values, makes us do things that are unnecessary and fail to do things that are nec-
essary. We fail to distinguish in our verbiage between prediction and measurement.
For example, two famous vague phrases—"It is impossible to specify.. ." and "It is
impossible to define.. ."—can be interpreted equally well as statements about pre-
diction or about measurement. Thus, the demonstrably correct statement that the
present theory cannot predict something becomes twisted into the almost certainly
false claim that the experimentalist cannot measure it!
We routinely commit the Mind Projection Fallacy of projecting our own
thoughts out onto Nature, supposing that creations of our own imagination are real
386 E. T. Jaynes
properties of Nature, or our own ignorance signifies some indecision on the part of
Nature. This muddying up of the distinction between reality and our knowledge of
reality is carried to the point where we find some asserting the objective reality of
probabilities, while denying the objective reality of atoms! These sloppy habits of
language have tricked us into mystical, pre-scientific standards of logic, and leave
the meaning of any QM result simply undefined. Yet we have managed to learn how
to calculate with enough art and tact so that we come out with the right numbers!
The main suggestion we wish to make is that how we look at basic probability
theory has deep implications for the Bohr-Einstein positions. Only within the past
year has it appeared to the writer that we might be able finally to resolve these
matters in the happiest way imaginable: a reconciliation of the views of Bohr and
Einstein in which we can see that they were both right in the essentials, but just
thinking on different levels.
Einstein's thinking is always on the ontological level traditional in physics,
trying to describe the realities of Nature. Bohr's thinking is always on the episte-
mological level, describing not reality but only our information about reality. The
peculiar flavor of his language arises from the absence of words with any ontolog-
ical import; the notion of a "real physical situation" was just not present and he
gave evasive answers to questions of the form: "What is really happening?" Eugene
Wigner24 was acutely aware of and disturbed by this evasiveness when he remarked:
These Copenhagen people are so clever in their use of language that, even
after they have answered your question, you still don't know whether the
answer was "yes" or "no"!
J. R. Oppenheimer, more friendly to the Copenhagen viewpoint, tried to explain
it in his lectures in Berkeley in the 1946-47 school year. Oppy anticipated multiple-
valued logic when he told us:
Consider an electron in the ground state of the hydrogen atom. If you ask,
"Is it moving?," the answer is "no." If you ask, "Is it standing still?," the
answer is "no."
Those who, like Einstein (and, up till recently, the present writer) tried to
read ontological meaning into Bohr's statements, were quite unable to understand
his message. This applies not only to his critics but equally to his disciples, who
undoubtedly embarrassed Bohr considerably by offering such exegeses as, "Instan-
taneous quantum jumps are real physical events," or "The variable is created by
the act of measurement," or the remark of Pauli quoted above, which might be
rendered loosely as, "Not only are you are I ignorant of x and p; Nature herself
does not know what they are."
Critics who tried to summarize Bohr's position sarcastically as, "If I can't
measure it, then it doesn't exist!," were perhaps closer in some ways to his actual
thinking than were his disciples. Of course, while Bohr studiously avoided all as-
sertions of "reality," he did not carry this to the point of denying reality; he was
Probability in Quantum Theory 387
merely silent on the issue, and would prefer to say, simply: "If we can't measure it,
then we can't use it for prediction."
Although Bohr's whole way of thinking was very different from Einstein's,
it does not follow that either was wrong. In the writer's view, all of Einstein's
thinking—in particular the EPR argument—remains valid today, when we take
into account its ontological character. But today, when we are beginning to con-
sider the role of information for science in general, it may be useful to note that
we are finally taking a step in the epistemological direction that Bohr was trying
to point out sixty years ago.
This statement applies only to the general philosophical position that the role
of human information in science needs to be recognized and taken into account
explicitly. Of course, it does not mean that every technical detail of Bohr's work is
to remain unchanged for all time. Our present QM formalism is a peculiar mixture
describing in part laws of Nature and in part incomplete human information about
Nature—all scrambled up together by Bohr into an omelette that nobody has seen
how to unscramble. Yet we think that the unscrambling is a prerequisite for any
further advance in basic physical theory and we want to speculate on the proper
tools to do this.
We suggest that the proper tool for incorporating human information into sci-
ence is simply probability theory—not the currently taught "random variable" kind,
but the original "logical inference" kind of James Bernoulli and Laplace. For histori-
cal reasons explained elsewhere,11 this is often called "Bayesian probability theory."
When supplemented by the notion of information entropy, this becomes a mathe-
matical tool for scientific reasoning of such power and versatility that we think it
will require a century to explore its capabilities. But the preliminary development
of this tool and testing it on simple problems is now fairly well in hand, as described
below.
A job for the immediate future is to see whether, by proper choice of variables,
Bohr's omelette can be seen as a kind of approximation to it. In the 1950's, Richard
Feynman noted that some of the probabilities in quantum theory obey different
rules (interference of path amplitudes) than do the classical probabilities. But more
recently12 we have found that the QM probabilities involved in the EPR scenario are
striking similar to the Bayesian probabilities, often identical; we interpret Bohr's
reply to EPR as a recognition of this. That is, Bohr's explanation of the EPR
experiment is a fairly good statement of Bayesian inference. Therefore, the omelette
does have some discernible structure of the kind that we would need in order to
unscramble it.
their own ignorance; it is apparent that their tactics amount to mere chanting of
ideological slogans, while simply ignoring the relevant, demonstrable technical facts.
But the failure of our critics to find inconsistencies does not prove that our
methods have any positive value for science. Are there any new useful results to be
had from using probability theory as logic? Some are reported in the proceedings
volumes of the Annual (since 1981) MAXENT workshops, particularly the one
in Cambridge, England in August 198812 wherein a generalized Second Law of
Thermodynamics is used in what we think is the first quantitative application of
the second law in biology. But, unfortunately, most of the problems solvable by
pencil-and-paper methods were too trivial to put this issue to a real test; although
the results never conflicted with common sense, neither did they extend it very
far beyond what common sense could see or what "random variable" probability
theory could also derive.
Only recently, thanks to the computer, has it become feasible to solve real,
nontrivial problems of reasoning from incomplete information, in which we use
probability theory as a form of logic in situations where both intuition and "random
variable" probability theory would be helpless. This has brought out the facts in a
way that can no longer be obscured by arguments over philosophy. It is not easy
to argue with a computer printout, which says to us: "Independently of all your
philosophy, here are the facts about what this method actually gives when applied."
The "MAXENT" program developed by John Skilling, Steve Gull, and their
colleagues at Cambridge University, England can maximize entropy numerically in
a space of 1,000,000 dimensions, subject to 2,000 simultaneous constraints. The
"Bayesian" data-analysis program developed by G. L. Bretthorst2 at Washington
University, St. Louis, can eliminate a hundred uninteresting parameters and give
the simultaneous best estimates of twenty interesting ones and their accuracy, or it
can take into account all the parameters in a set of possible theories or "models"
and give us the relative probabilities of the theories in the light of the data. It was
interesting, although to us not surprising, to find that this leads automatically to
a quantitative statement of Occam's Razor: prefer the simpler theory unless the
other gives a significantly better fit to the data.
Many computer printouts have now been made at Cambridge University, of
image reconstructions in optics and radio astronomy, and at Washington University
in analysis of economic, geophysical, and nuclear magnetic resonance data. The
results were astonishing to all of us; they could never have been found, or guessed,
by hand methods.
In particular, the Bretthorst programs3,4,5 extract far more information from
NMR data (where the ideal sinusoidal signals are corrupted by decay) than could
the previously used Fourier transform methods. No longer does decay broaden the
spectrum and obscure the information about oscillation frequencies; the result is
an order-of-magnitude-better resolution.
Less spectacular numerically, but equally important in principle, they yield fun-
damental improvements in extracting information from economic time series when
the data are corrupted by trend and seasonality; no longer do these obscure the
information that we are trying to extract from the data. Conventional "random
390 E. T. Jaynes
variable" probability theory lacks the technical means to eliminate nuisance pa-
rameters in this way, because it lacks the concept of "probability of a hypothesis."
In other words, there is no need to shout: it is now a very well-demonstrated
fact that, after all criticisms of its underlying philosophy, probability theory inter-
preted and used as the logic of human inference does rather well in dealing with
problems of scientific reasoning—just as James Bernoulli and Laplace thought it
would, back in the 18th Century.
Our probabilities and the entropies based on them are indeed "subjective" in
the sense that they represent human information; if they did not, they could not
serve their purpose. But they are completely "objective" in the sense that they
are determined by the information specified, independently of anyone's personality,
opinions, or hopes. It is "objectivity" in this sense that we need if information is
ever to be a sound basis for new theoretical developments in science.
The first difficulty we encounter upon any suggestion that probabilities in quan-
tum theory might represent human information is the barrage of criticism from
those who believe that dispersions (LF)2 = (F2) — (F)2 represent experimentally
observable "quantum fluctuations" in F. Some even claim that these fluctuations
are real physical events that take place constantly whether or not any measurement
is being made (although, of course, that does violence to Bohr's position). At the
1966 Rochester Coherence Conference, Roy Glauber assured us that vacuum fluc-
tuations are "very real things" and that any attempts to dispense with EM field
quantization are therefore doomed to failure. It can be reported that he was widely
and enthusiastically believed.
Now in basic probability theory, OF represents fundamentally the accuracy
with which we are able to predict the value of F. This does not deny that it may
also be the variability seen in repeated measurements of F, but the point is that
they need not be the same. To suppose that they must be the same is to commit an
egregious form of the Mind Projection Fallacy; the fact that our information is able
to determine F only to five percent accuracy, is not enough to make it fluctuate by
five percent! However, it is almost right to say that, given such information, any
observed fluctuations are unlikely to be greater than five percent.
Let us analyze in depth the single example of EM field fluctuations, and show
that (1) the experimental facts do not require vacuum fluctuations to be real events
after all; (2) Bayesian probability at this point is not only consistent with the ex-
perimental facts, it offers us some striking advantages in clearing up past difficulties
that have worried generations of physicists.
density Wzp in space, the Kepler ratio for a planet of mean distance R from the
sun would be changed to
R3 G r 47R3
Wzp (4)
= [Mi un 3C2 ,
Numerical analysis of this shows that, in order to avoid conflict with the observed
Kepler ratios of the outer planets, the upper frequency cutoff for the ZP energy
would have to be taken no higher than optical frequencies.
But attempts to account for the Lamb shift by ZP fluctuations would require
a cutoff thousands of times higher, at the Compton wavelength. The gravitational
field from that energy density would not just perturb the Kepler ratio; it would
completely disrupt the solar system as we know it.
The difficulty would disappear if one could show that the aforementioned ef-
fects have a different cause, and ZP field energy is not needed to account for any
experimental facts. Let us try first with the simplest effect, spontaneous emission.
The hypothesized zero-point energy density in a frequency band Aw is
4p2413
A= (6)
3h1
where p is the dipole moment matrix element for the transition, sees this over an
effective bandwidth
Aco =
f
1(40(k,, wA
=
I(coo) 2 (7)
1
/(w) cc (w — 44)02 + (A/2)2 (8)
AU, = ffip
2(u.) 6
T ergs/cm3
1
Wzp eff = pzp (4.7 )
(9)
and it seems curious that Planck's constant has cancelled out. This indicates the
magnitude of the electric field that a radiating atom sees according to the ZP theory.
On the other hand, the classical radiation reaction field generated by a dipole
of moment p:
2 d3p 2w3
ERR = = LA (10)
3c3 dt3 3c3
394 E. T. Jaynes
ERR
2 2 r 6
WRR - p —) ergs/cm3
8
r 18r c
But Eqs. (9) and (11) are identical! A radiating atom is indeed interacting with an
electric field of just the magnitude predicted by the zero-point calculation, but this
is the atom's own radiation reaction field.
Now we can see that this needed field is generated by the radiating atom,
automatically but in a more economical way; only where it is needed, when it is
needed, and in the frequency band needed. Spontaneous emission does not require
an infinite energy density throughout all space. Surely, this is a potentially far more
satisfactory way of looking at the mechanism of spontaneous emission (if we can
clear up some details about the dynamics of the process).
But then someone will point immediately to the Lamb shift; does this not
prove the reality of the ZP energy? Indeed, Schwinger17,18 and Weisskopf 22 stated
explicitly that ZP field fluctuations are the physical cause of the Lamb shift, and
Welton23 gave an elementary "classical" derivation of the effect from this premise.
Even Niels Bohr concurred. To the best of our knowledge, the closest he ever
came to making an ontological statement was uttered while perhaps thrown mo-
mentarily off guard under the influence of Schwinger's famous eight-hour lecture at
the 1948 Pocono conference. As recorded in John Wheeler's notes on that meeting,
Bohr says: "It was a mistake in the older days to be discontented with field and
charge fluctuations. They are necessary for the physical interpretation."
In 1953 Dyson7 also concurred, picturing the quantized field as something akin
to hydrodynamic flow with superposed random turbulence, and he wrote: "The
Lamb-Retherford experiment is the strongest evidence we have for believing that
our picture of the quantum field is correct in detail." Then in 1961 Feynman sug-
gested that it should be possible to calculate the Lamb shift from the change in
total ZP energy in space due to the presence of a hydrogen atom in the 2s state;
and in 1966 E. A. Power16 gave the calculation demonstrating this in detail. How
can we possibly resist such a weight of authority and factual evidence?
As it turns out, quite easily. The problem has been that these calculations
have been done heretofore only in a quantum field theory context. Because of this,
people jumped to the conclusion that they were quantum effects (i.e., effects of
field quantization), without taking the trouble to check whether they were present
also in classical theory. As a result, two generations of physicists have regarded the
Lamb shift as a deep, mysterious quantum effect that ordinary people cannot hope
to understand. So we are facing not so much a weight of authority and facts as a
mass of accumulated folklore.
Since our aim now is only to explain the elementary physics of the situation
rather than to give a full formal calculation, let us show that this radiative fre-
quency shift effect was present already in classical theory, and that its cause lies
simply in properties of the source field (Eq. (1)), having nothing to do with field
fluctuations. In fact, by stating the problem in Hamiltonian form, we can solve
Probability in Quantum Theory 395
H =—
1
2
(p7
1
wiq?)+ (1'2 + S22Q2) — E aiqiQ • (12)
The physical effects of coupling the EO to the field variables may be calculated in
two "complementary" ways;
(I) Dynamic: how are the EO oscillations modified by the field coupling?
(H) Static: what is the new distribution of normal mode frequencies?
The new normal modes are the roots {r/;} of the equation n2 — v2 = K(v),
where K(v) is the dispersion function
2 /co
K(v) E 2a` 2 — Jo
CJ• — v
=
K(i)e-ndt, $ = iv . (13)
Let us solve the problem first in the more familiar dynamical way. With initially
quiescent field modes qi(0) = 4i(0) = 0, the decay of the extra oscillator is found
to obey a Volterra equation:
Thus K(t) is a memory function and the integral in Eq. (14) is a source field. For
arbitrary initial EO conditions Q(0), Q(0), the solution is
e'y= dv
G(t)= 3;
1 .1.70 122 — p2 K(v) (16)
396 E. T. Jaynes
where the contour goes under the poles on the real axis. This is the exact decay
solution for arbitrary field mode patterns.
In the limit of many field modes, this goes into a simpler form. There is a mode
density function po(w):
( ) *-- jam( )Po(u))dw • (17)
Then from Eq. (13), K(v) goes into a slowly varying function on the path of inte-
gration Eq. (16):
and neglecting some small terms, the resulting Green's function goes into
sin(f2 + A)t
G(t) _, exp( rt) (19)
(f2 + A)
where
r(o) = ra2(0)po(n)
4112
(20)
(1"‘ 1 D 1.°3 cr2 (w)po(ca)du, 1 p ic° r(w)dw
A(4 2114 jo f22 —w2 = 7 Loc, f2 —4)
(21)
are the "spontaneous emission rate" and "radiative frequency shift" exhibited by
the EO due to its coupling to the field modes. We note that A(11) and r(w) form
a Hilbert transform pair (a Kramers-Kronig-type dispersion relation expressing
causality). In this approximation, Eq. (15) becomes the standard exponentially
damped solution of a linear differential equation with loss: Q + 21V+ (f2+A)2 Q =
0.
As a check, it is a simple homework problem to compare our damping factor r
with the well-known Larmor radiation law, by inserting into the above formulas the
free-space mode density function po(w) = Vw2/72c3, and the coupling coefficients
ai. appropriate to an electric dipole of moment p proportional to Q. We then find
f 4=42 \ I vw2 N 2
A2 ii2(4' 1
r(w) ---- (4:2) kw •• 3v ) k724.3) = 3Q2c3 sec (22)
and it is easily seen that for the average energy loss over a cycle this agrees exactly
with the Larmor formula
2c0 . ,
Prad = 3 kx? (23)
for radiation from an accelerated particle. In turn, the correspondence between the
Larmor radiation rate and the Einstein A-coefficient (6) is well-known textbook
material.
Probability in Quantum Theory 397
It is clear from this derivation that the spontaneous emission and the radiative
frequency shift do not require field fluctuations, since we started with the explicit
initial condition of a quiescent field: qi = qi = 0. The damping and shifting are due
entirely to the source field reacting back on the source, as expressed by the integral
in Eq. (14).
Of course, although the frequency shift formula (21) resembles the "Bethe loga-
rithm" expression for the Lamb shift, we cannot compare them directly because our
model is not a hydrogen atom; we have no s-states and p-states. But if we use values
of of and SZ for an electron oscillating at optical frequencies and use a cutoff corre-
sponding to the size of the hydrogen atom, we get shifts of the order of magnitude
of the Lamb shift. A more elaborate calculation will be reported elsewhere.
But now this seems to raise another mystery; if field fluctuations are not the
cause of the Lamb shift, then why did the aforementioned Welton and Power calcu-
lations succeed by invoking those fluctuations? We face here a very deep question
about the meaning of "fluctuation-dissipation theorems." There is a curious math-
ematical isomorphism; throughout this century, starting with Einstein's relation
between diffusion coefficient and mobility D = 6x2/2t = kTu and the Nyquist
thermal noise formula for a resistor 6V2 = 4kTRAf,, theoreticians have been deriv-
ing a steady stream of relations connecting "stochastic" problems with dynamical
problems.
Indeed, for every differential equation with a non-negative Green's function,
there is an obvious stochastic problem which would have the same mathematical
solution even though the problems are quite unrelated physically, but as Mark
Kac14 showed, the mathematical correspondence between stochastic and dynamical
problems is much deeper and more general than that.
These relations do not prove that the fluctuations are real; they show only that
certain dissipative effects (i.e., disappearance of the extra oscillator energy into the
field modes) are the same as if fluctuations were present. But then by the Hilbert
transform connection noted, the corresponding reactive effects must also be the
same as if fluctuations were present; the calculation of Welton23 shows how this
comes about.
But this still leaves a mystery surrounding the Feynman-Power calculation,
which obtains the Lamb shift from the change in total ZP energy in the space
surrounding the hydrogen atom; let us explain how that can be.
To calculate the mode density increment, we need to evaluate the limiting form of
the dispersion function K(v) more carefully than in Eq. (18).
From the Hamiltonian (12), the normal modes are the roots {14} of the disper-
sion equation
S22 _ v2 = K(v) = \••••• 2ai 2 • (25)
ts wi — v
K(v) resembles a tangent function, having poles at the free field mode frequencies wi
and zeroes close to midway between them. Suppose that the unperturbed frequency
S2 of the EO lies in the cell (wi < ft < coi+i). Then the field modes above it are
raised by amounts 6vk = vk —wk , k = i+1, i+2, • - - n. The field modes below it are
lowered by Ovk = vk-1 — wk, k = 1,2, • • - i; and one new normal mode vi appears
in the same cell as St: (wi < vi < wi+1). The separation property (exactly one new
mode vk lies between any two adjacent old modes 44) places a stringent limitation
on the magnitude of any static mode shift byk.
Thus the original field modes wi are, so to speak, pushed aside by a kind of
repulsion from the added frequency St, and one new mode is inserted into the gap
thus created. If there are many field modes, the result is a slight increase pi(v) in
mode density in the vicinity of Q. To calculate it, note that if the field mode wi is
shifted a very small amount to vk = (di + by, and by varies with 4.7i, then the mode
density is changed to
In the continuum limit, pc, —4 oo and by —0. 0; however, the increment pi(w) remains
finite and as we shall see, loaded with physical meaning.
We now approximate the dispersion function K(v) more carefully. In Eq. (16)
where Im(v) < 0, we could approximate it merely by the integral, since the local
behavior (the infinitely fine-grained variation in K(v) from one pole to the next)
cancels out in the limit at any finite distance from the real axis. But now we need
it exactly on the real axis, and those fine-grained local variations are essential,
because they provide the separation property that limits the static mode shifts by.
Consider the case where wi > St and v lies in the cell (wi < v < coi+i). Then
the modes are pushed up. If the old modes near v are about uniformly spaced, we
have for small n, wi+n ce. wi + nl Po(co); therefore
2 2 2v (27)
wil-n — V -= — (n — POO ,
Po
and the sum of terms with poles near v goes into
E 2v'
s+n
(n —
Pio) --
pobv) —
)
ra2( Pp(v) COt[7p0(08d
2v
(28)
Probability in Quantum Theory 399
where we supposed the eri slowly varying and recognized the Mittag-Leffler expan-
sion 7 cot vx = E(x — n)-1. The contribution of poles far from v can again be
represented by an integral. Thus, on the real axis, the dispersion function goes, in
the continuum limit, into
7 2 /30 a2 (w)Po(w)c140
K(v) a cot[ir po(v)bv] + P w 2 v2 '
2v
But in this we recognize our expressions (20) and (21) for r and A:
K(v) :-_.• —2Q [A + r cot(irpobv)] . (29)
As a check, note that if we continue 6v below the real axis, the cotangent goes
into cot( —ix) -4 +i, and we recover the previous result (18). Thus if we again
assume a sharp resonance (S2 v) and write the dynamically shifted frequency as
cva = 52 + A, the dispersion relation (25) becomes a formula for the static mode
shift 5v:
po(v)bv = tan-1 ( r (30)
v — coo
and (26) then yields for the increment in mode density a Lorentzian function:
1 Lb/
Pi(v)dv = .... woz rz • (31)
with the same shift and width as we found in the dynamical calculation (14).
As a check, note that the increment is normalized, f pidv = 1 as it should be,
since the "macroscopic" effect of the coupled EO is just to add more new mode to
the system. Note also that the result (31) depended on K(v) going locally into a
tangent function. If for any reason (i.e., highly nonuniform mode spacing or coupling
constants, even in the limit) K(v) does not go into a tangent function, we will not
get a Lorentzian p1(v). This would signify perturbing objects in the field, or cavity
walls that do not recede to infinity in the limit, so echoes from them remain.
But the connection (32) between the mode density increment and the decay
law is quite general. It does not depend on the Lorentzian form of pi(v), on the
particular equation of motion for Q, on whether we have one or many resonances
Q, or indeed on any property of the perturbing EO other than the linearity of its
response.
To see this, imagine that all normal modes are shock excited simultaneously
with arbitrary amplitudes A(v). Then the response is a superposition of all modes:
But since the first integral represents the response of the free field, the second must
represent the "ringing" of whatever perturbing objects are present. If A(v) is nearly
constant in the small bandwidth occupied by a narrow peak in pi(v), the resonant
ringing goes into the form (32).
Therefore, every detail of the transient decay of the dynamical problem is, so
to speak, "frozen into" the static mode density increment function pi (v) and can
be extracted by taking the Fourier transform (32). Thus a bell, excited by a pulse
of sound, will ring out at each of its resonant frequencies, each separate resonance
having a decay rate and radiative frequency shift determined by pi (v) in the vicinity
of that resonance.
Then a hydrogen atom in the 2s state, excited by a sharp electromagnetic pulse,
will "ring out" at the frequencies of all the absorption or emission lines that start
from the 2s state, and information about all the rates of decay and all the radiative
line shifts, is contained in the pi(v) perturbation that the presence of that atom
makes in the field-mode density.
Thus Feynman's conjecture about the relation between the Lamb shift and the
change in ZP energy of the field around that atom, is now seen to correspond to a
perfectly general relation that was present all the time in classical electromagnetic
and acoustical theory, and might have been found by Rayleigh, Helmholtz, Maxwell,
Larmor, Lorentz, or Poincare in the last century.
It remains to finish the Power-type calculation and show that simple classical
calculations can also be done by the more glamorous quantum mechanical methods
of "subtraction physics" if one wishes to do so. Suppose we put the extra oscillator
in place and then turn on its coupling to the field oscillators. Before the coupling is
turned on, we have a background mode density po(w) with a single sharp resonance,
mode density 8(w — n) superimposed. Turning on the coupling spreads this out into
pi(w), superimposed on the same background, and shifts its center frequency by
just the radiative shift A. In view of the normalization of pi(w), we can write
00
A= wpi (w)dco — St . (34)
0
Suppose, then, that we had asked a different question: "What is the total frequency
shift in all modes due to the coupling?" Before the coupling is turned on, the total
frequency is a badly divergent expression:
f oo
(00)i = 12 + J Po(w)dAa (35)
o
and afterward it is
j w Epo(w) + P1(w)lelho
(00)2 = o°3 (36)
which is no better. But then the total change in all mode frequencies due to the
coupling is, from Eq. (34):
(00)2 — (00)1 = A.
A (37)
Probability in Quantum Theory 401
CONCLUSION
We have explored only a small part of the issues that we have raised; however, it is
the part that has seemed the greatest obstacle to a unified treatment of probability
in quantum theory. Its resolution was just a matter of getting our physics straight;
we have been fooled by a subtle mathematical correspondence between stochastic
and dynamical phenomena, into a belief that the "objective reality" of vacuum
fluctuations and ZP energy are experimental facts. With the realization that this
is not the case, many puzzling difficulties disappear.
We then see the possibility of a future quantum theory in which the role of in-
complete information is recognized: the dispersion (AF)2 = (F2) — (F)2 represents
fundamentally only the accuracy with which the theory is able to predict the value
of F. This may or may not be also the variability in the measured values.
In particular, when we free ourselves from the delusion that probabilities are
physically real things, then when OF is infinite, that does not mean that any
physical quantity is infinite. It means only that the theory is completely unable to
predict F. The only thing that is infinite is the uncertainty of the prediction. In
our view, this represents the beginning of a far more satisfactory way of looking
at quantum theory, in which the important research problems will appear entirely
different than they do now.
402 E. T. Jaynes
REFERENCES
1. Allen, L., and J. H. Eberly. Optical Resonance and Two-Level Atoms, chap. 7.
New York: J. Wiley and Sons, 1975.
2. Bretthorst, G. L. "Bayesian Spectrum Analysis and Parameter Estimation."
Springer Lecture Notes in Statistics 48 (1988).
3. Bretthorst, G. L., C. Hung, D. A. D'Avegnon, and J. H. Ackerman. "Bayesian
Analysis of Time-Domain Magnetic Resonance Signals." J. Mag. Res. 79
(1988):369-376.
4. Bretthorst, G. L., J. J. Kotyk, and J. H. Ackerman. "31P NMR Bayesian
Spectral Analysis of Rat Brain in Vivo." Mag. Res. in Medicine 9 (1989):282-
287.
5. Bretthorst, G. L., and C. Ray Smith. "Bayesian Analysis of Signals from
Closely Spaced Objects." In Infrared Systems and Components III, edited by
Robert L. Caswell, vol. 1050. San Francisco: SPIE, 1989, 93-104.
6. Casimir, H. G. B. Proc. K. Ned. Akad. Wet. 51 (1948):635.
7. Dyson, F. J. "Field Theory." Sci. Am. (April 1953):57.
8. Jaynes, E. T. "Confidence Invervals vs. Bayesian Intervals." In Foundations
of Probability Theory, Statistical Inference, and Statistical Theories of Sci-
ence, edited by W. L. Harper and C. A. Hooker. Dordrecht-Holland: D. Rei-
del Pub. Co., 1976; reprinted in part in Ref. 10.
9. Jaynes, E. T. "Where Do We Stand on Maximum Entropy?" In The Maxi-
mum Entropy Formalism, edited by R. D. Levine and M. Tribus. Cambridge:
MIT Press, 1978; reprinted in Ref. 10.
10. Jaynes, E. T. Papers on Probability, Statistics, and Statistical Physics, edited
by R. D. Rosenkrantz. Holland: D. Reidel Publishing Co., 1983; reprints of
13 papers dated 1957-1980. Second paperback edition by Kluwer Academic
Publishers, Dordrecht, 1989.
11. Jaynes, E. T. "Bayesian Methods: General Background." In Maximum En-
tropy and Bayesian Methods in Applied Statistics, edited by J. H. Justice.
Cambridge: Cambridge University Press, 1986, 1-25.
12. Jaynes, E. T. "Clearing up Mysteries: The Original Goal." In Maximum En-
tropy and Bayesian Methods, edited by J. Skilling Kluwer. Holland: Academic
Publishers, 1989, 1-27.
13. Jeffreys, H. Probability Theory. Oxford: Oxford Univ. Press, 1939; later edi-
tions 1948, 1961, and 1966. A wealth of beautiful applications showing in de-
tail how to use probability theory as logic.
14. Kac, M. "Some Stochastic Problems in Physics and Mathematics." Collo-
quium Lectures in Pure and Applied Science #2, FRL. Dallas, Texas: Magno-
lia Petroleum Company, 1956.
15. Milonni, P., J. Ackerhalt, and R. A. Smith. "Interpretation of Radiative Cor-
rections in Spontaneous Emission." Phys. Rev. Lett. 31 (1973):958.
16. Power, E. A. "Zero-Point Energy and the Lamb Shift." Am. J. Phys. 34
(1966):516. Note that factors of 2 are missing from Eqs. (13) and (15).
Probability in Quantum Theory 403
INTRODUCTION
Measurements in general are performed in order to increase information about
physical systems. This information, if appropriate, may in principal be used for
a reduction of their thermodynamical entropies—as we know from the thought
construction of Maxwell's demon.
As we have been taught by authors like Smoluchowski, Szilard, Brillouin and
Gabor, one thereby has to invest at least the equivalent measure of information
(therefore also called "negentropy") about a physical system in order to reduce
its entropy by a certain amount. This is either required by the Second Law (if
it is applicable for this purpose), or it can be derived within classical statistical
mechanics by using
a. determinism and
b. the assumption that perturbations from outside may be treated stochastically
in the forward direction of time (condition of "no conspiracy").
The total ensemble entropy may then never decrease, and one can use diagrams
as that in Figure 1 to represent sets of states for the system being measured (a, b),
the measurement and registration device (0, A, B), and the environment (A', B')
which is required for the subsequent reset of the apparatus.'
In statistical arguments of this kind, no concepts from phenomenological ther-
modynamics have to be used. Statistical counting is a more fundamental notion
than energy conservation or temperature if the concept of deterministically evolv-
ing microscopic states is assumed to apply. The price to be paid for this advantage
is the problem arising from the fact (much discussed at this conference) that the
statistical ensemble entropy is not uniquely related to thermodynamical entropy.
This problem is even more important in the quantum mechanical description.
In quantum theory the statistical entropy is successfully calculated from the density
matrix (regardless of the latter's interpretation). This density matrix changes non-
unitarily (i.e., the state vectors diagonalizing it change indeterministically) in a
measurement process (a situation usually referred to as the collapse or reduction
of the wave function). So, for example, Pauli concluded that "the appearance of a
certain result in a measurement is then a creation outside the laws of nature." This
may be a matter of definition—but the state vector (as it is used to describe an
actual physical situation) is effected by the collapse, and so is the entropy calculated
from it or from the density matrix!
Must this deviation from the deterministic SchrOdinger equation now lead to a
violation of the Second Law, as discussed in a beautiful way at this conference by
Peres? In particular, can Maxwell's demon possibly return through the quantum
back door?
Or Or
msmt reset
S = kIn2 + c S =c S = kIn2+ c
I =0 I = kIn2 I =0
T - 2.7K
t = 101%
t = 3.105a T = 4.103K
non-ideal absorber
t=0 T =oo
where "nearby" may in astronomical situations include stars and galaxies. Eighty
years ago Ritzy required that this condition should hold by law of nature, that
is, exactly if considered for the whole universe. This assumption would eliminate
the electromagnetic degrees of freedom and replace them by a retarded action at a
distance. It corresponds to a cosmological initial condition ("Sommerfeld radiation
condition") f's" in = 0.
A similar proposal was made 70 years later by Penrose" for gravity instead of
electrodynamics in terms of his Weyl tensor hypothesis. Both authors expressed the
expectation that their assumptions might then also explain the thermodynamical
arrow of time.
The usual explanation of the electromagnetic arrow is that it is instead caused
by the thermodynamical arrow of absorbers (see Figure 2): no field may leave an
ideally absorbing region in the forward direction of time. (The same condition is re-
quired in the Wheeler-Feynman absorber theory in addition to their time-symmetric
"absorber condition.")
The electrodynamical arrow of time can then easily be understood inside of
closed laboratories possessing absorbing walls. In cosmology (where the specific
boundary condition is referred to as Olbers' paradox) the situation is slightly differ-
ent. According to the big bang model there was a non-ideal (hot) absorber early in
the universe (the radiation era; see Figure 3). Its thermal radiation has now cooled
down to form the observed 2.7 K background radiation which is compatible with
the boundary condition at wave lengths normally used in experiments. This early
absorber hides the true Pi° in from view—although it is "transparent" for gravity.
However, it is important for many thermodynamical consequences that zero-
mass fields possess a very large entropy capacity ("blank paper" in the language of
information physics). This is true in particular for gravity because of the general
attractivity and self-interaction that leads to the formation of black holes.15
Quantum Measurements 409
This leads to a coupled dynamics for p fel and Pirrel = (1 — P)p according to
ia Prel
=P L Pr el + PLpirrei
at
iaPirret
=(1 — P)LPre/ + (1 — P)Lpirrez
at
Then formally solve the second equation for Pirrel(t) with pr el as an inhomo-
geneity (just as when calculating the electromagnetic field as a functional of the
sources), and insert it into the first one to get the still exact (hence reversible)
pre-master equation for pre:
S = —k f prel(In prel)dp dq
to obtain dSldt > 0, that is, a monotonic loss of relevant information. In gen-
eral, however, the equality sign dS/dt Ar. 0 would be overwhelmingly probable
unless the further initial condition S(t = 0) < Smas held. This fact again
possesses its analogue in electrodynamics: not all sources must be absorbers in
order to prevent a trivial situation.
'Prei(c) -4\Prei(t)
I (_
relevant chann(
( •i
\ s'i
----Neel
Pirrel(°) lll :
irrelevant channel
II
t=0 t=ti t
At
There exist many concepts of relevance (or "faces of entropy") suited for differ-
ent purposes. Best known are Gibbs' coarse graining and Boltzmann's restriction
to single-particle phase space (with the initial condition of absent particle correla-
tions referred to as molecular chaos). They depend even further on the concept of
particles used (elementary or compound, often changing during phase transitions
or chemical reactions). Two others, Pleca1 and Pma,.o, will be considered in more
detail.
For most of the relevance concepts used, the condition pirrez(0) = 0 does not
appear very physical, since it refers to the knowledge described by the ensembles.
Only some of them are effective on pure ("real") states, that is, they define a non-
trivial entropy or "representative ensemble" as a function of state. All of them
are based on a certain observer-relatedness: there is no objective reason for using
ensembles or a concept of relevance. Also, Kolmogorov's entropy is based on a
relevant measure of distance, while algorithms (used to define algorithmic entropy)
are based on a choice of relevant coordinates. Hence, what we call chaos may merely
be chaos to us!
Two Zwanzig projections will be of particular interest to illuminate the special
character of quantum aspects. The first one is
PlocaleN =
The last statement is not true any more in quantum mechanics because of the
existence of the fundamental quantum correlations which lead to the violation of
the Bell inequality.
The second projection of special interest is defined by
412 H. D. Zeh
Pa
PmacroP(P)q) = const =:Pa = Vac on a(p, q) = const ,
which represent the "lacking information" about the macroscopic quantities and
the mean "physical entropy" S(a) = k Intro, (Planck's "number of complexions"),
respectively. This allows the deterministic transformation of physical entropy into
"lacking information" (thereby conserving the ensemble entropy as in Figure 1).
It is, in fact, part of Szilard's gedanken engine (Figure 6), where the transforma-
tion of entropy into lacking information renders the subdensities "robust." In its
quantum version, this first part of the procedure may require the production of
an additional, negligible but non-zero, amount of entropy in order to destroy the
quantum correlations between the two partial volumes."
io p
= L p = [H, p] ,
Ot
Quantum Measurements 413
with respect to a given (relevant) basis. Zwanzig's equation then becomes the
van Hove equation (with an additional Born approximation the Pauli equation,
or Fermi's Golden Rule after summing over final states). It has the form
dpmn
EnAmn(Prsn — Pmm)
dt =
with transition probabilities Amn analogous to Boltzmann's Stof3zahlansatz. The
meaning and validity of Zwanzig's approximation depends crucially on the choice
of the "relevant basis." For example, it would become trivial (Amn = 0) in the
exact energy basis.
In spite of its formal analogy to the classical theory, the quantum master equa-
tion describes the fundamental quantum indeterminism—not only an apparent in-
determinism due to the lack of initial knowledge. For example, Pauli's equation is
identical to Born's original probability interpretation3 (which also introduced the
Born approximation). It was to describe probabilities for new wave functions (not
for classical particle positions), namely for the final states of the quantum jumps
between Schrodinger's stationary eigenstates of the Hamiltonians of noninteracting
local systems (which thus served as the dynamically "relevant basis"). Even these
days the eigenstates of the second (and recently also of the "third") quantization are
sometimes considered as a "natural" and therefore fundamental basis of relevance
to describe the collapse of the wave function as an objective process—although laser
physicists, of course, know better.
Hence, the analogy to the classical theory is misleading. The reason is that
the ensemble produced by the Zwanzig projection P from a pure state in general
does not contain this state itself any more. According to the very foundation of the
concept of the density matrix, it merely describes the probabilities for a collapse
into the original state (or from it into another state).
In order to see this, the measurement process has to be considered in more
detail. Following von Neumann's formulation one may write
where the first step represents an appropriate interaction in accordance with the
SchrOdinger equation and the second step the collapse. I have left out an interme-
diate step leading to the ensemble of potential final states with their corresponding
probabilities, since it describes only our ignorance of the outcome. The determin-
istic first step can again be realistic only if tk represents the whole "rest of the
universe," including the apparatus and the observer. This is the quantum analogue
414 H. D. Zeh
The corresponding master equation of local relevance requires some initial con-
dition like
(1 — Plocal)Pin Pe, 0
This process contains the collapse, since the SchrOdinger equation with a symmetric
Hamiltonian can only lead from the false vacuum VI, an 0) to a symmetric superpo-
sition f 10)4. Unless the false vacuum has the same energy expectation value as
the physical vacuum, the state on the right-hand side must also contain excitations
(which, in fact, contribute to the "measurement" of the value of rk characterizing a
specific vacuum).
Except for the Casimir/Unruh-correlations the vacuum is a local state; that is,
it can approximately be written as the same vacuum at every place, {4) rir 10)r -
This is not true for the symmetric superposition f {0)4. Under the action of Procar,
this non-local state would instead lead to a mixed density matrix Pr,
Pr cc E wok. •
—r
Only the collapse leads then to a local zero-entropy state again, since it transforms
a non-local state into a local state.
It appears suggestive that a similar mechanism created the degrees of freedom
represented by the realistic zero-mass particles (photons and gravitons). This would
correspond to the creation of a large unoccupied entropy capacity without deter-
ministic "causes" (which would otherwise have to be counted by the previous values
of the ensemble entropy as in Figure 1), or of "blank paper from nothing" by the
symmetry-breaking power of the collapse.
416 H. D. Zeh
of the sun, and even events behind a spacetime horizon. Denying the Everett in-
terpretation (or considering its other branches as mere "possibilities") is hence just
another kind of solipsism!
This consideration emphasizes the observer-relatedness of the branching (and,
therefore, of entropy). A candidate for its precise formulation may be the Schmidt
canonical single sum representation
*(t) E r);70,„(04.„(i),
1
with respect to any (local) observer system 4. It is unique (except for degeneracy),
and therefore defines a "subjective" basis of relevance, although macroscopic prop-
erties contained in n seem to be objectivized by means of quantum correlations and
the "irreversible" action of decoherence.16
1141[131G, 4)(01 = 0 .
This equation does not allow to impose an initial condition of low entropy in the
usual way. How, then, can correlations such as those which are required to define
the branching evolve?
The answer seems to be contained in the fact that the Wheeler-DeWitt Hamil-
tonian H is hyperbolic. For example, for Friedmann-type models with a massive
quantum field (with its homogeneous part called 4>) one has
a a a
H = + ac,2 — 84,2 — - + v(a, - -) =: actz + nred
Er 2
where the dots refer to the higher multipoles of geometry and matter on the Fried-
mann sphere.6 This allows one to impose an initial condition with respect to the
"intrinsic time" a = In a, the logarithm of the Friedmann expansion parameter. The
reduced dynamics Hr ed defines an intrinsic determinism, although not, in general,
an intrinsic unitarily, since V(a, • - -) may be negative somewhere.
Because of the absence, in the wave function, of a term exp(icet), there is
no meaningful distinction between exp(+ika) and exp(—ika). (Halliwell—see his
contribution to this conference—has presented arguments that these components
decohere from another.) So the intrinsic big bang is identical to the intrinsic big
crunch: they form one common, intrinsically initial, "big brunch."
418 H. D. Zeh
From its solutions one may construct coherent wave tubes which approximately
define "orbits of causality" (see Figure 7) even when the actual wave function ex-
tends over the whole superspace. Similar behavior is found for other appropriate
potentials, although the wave packets in general show dispersion towards the turn-
ing point in a.1°
Corresponding wave functions in high-dimensional superspace show of course
more complex behavior and may lead to an increasing branching with increasing
a if an "initial" condition of lacking correlations holds for a —co. If one, then,
formally follows a turning classical orbit in mini- or midi-superspace, one should
observe branching of the wave function for the microscopic variables on the ex-
pansion leg, but recombination (inverse branching) on the return leg. This point
of view is however merely a relict of the concept of classical orbits; the subjective
arrow of time should in each leg be determined by the thermodynamical one. Closer
to the turning point no clearly defined arrow can exist along the classical orbits,
although the situation there seems to be very different from thermal equilibrium.
The consequence of this (in the classical picture "double-ended") quantum ver-
sion of the Cosmic Censorship postulate for the formation of black holes is not yet
understood.17
oo— v zo; p4-- (Auo v)111 4=
'cap pub uesotp 944 Aq peniosai lou s1 seqn4 etjo einpruis mem
.o!isIngele!„ o!supw! eqi lepow s144 p 'owe ue sl gpo eta jo s6e1 ong 944 ueenuaq
AnewwAs eqi .•Amesneo eu!PP (1 6 = pue p < 101 peuold
wet) Jotemoso owownq el!ugepul o!clauoswe eta jo (41, `v),it seqnt enem L aunDu
could simply be enforced by appropriate potential barriers for the multipole am-
plitudes. It would describe an initially "simple" (since unstructured) universe, al-
though not the ground state of the higher multipoles on the Friedmann sphere. Any
concept of ground or excited states could only be meaningful for them after they
have entered their domain of adiabaticity.
This conceivable existence of a completely symmetric pure initial state (instead
of a symmetric ensemble of very many states, the "real" member of which we were
then unable to determine from this initial condition) is a specific consequence of the
superposition principle, that is, of quantum mechanics. Before the "occurrence" of
the first collapse or branching, the universe would then not contain any non-trivial
degrees of freedom, or any potentiality of complexity.
This determination of the total wave function of the universe from its dynamics
depends of course on the behavior of the realistic potential V(a, (1), • • •) for a —oo.
Since it refers to the Planck era, this procedure would require knowledge about a
completely unified quantum field theory. Hopefully, this property of the potential
may turn out to be a useful criterion to find one! An appropriate potential for the
higher modes would even be able to describe their effective cut-off at wavelengths
of the order of the Planck length (useful for a finite renormalization) at all times.
ACKNOWLEDGMENT
I wish to thank C. Kiefer and H. D. Conradi for their critical reading of the
manuscript. Financial help from the Santa Fe Institute is acknowledged. This con-
tribution was not supported by Deutsche Forschungsgemeinschaft.
Quantum Measurements 421
REFERENCES
1. Bennett, C. H. "Logical Reversibility of Computation." IBM J. Res. Dev. 17
(1973):525.
2. Borel, E. Le hasard. Paris: Alcan, 1924.
3. Born, M.. "Das Adiabatenprinzip in der Quantenmechanik." Z. Physik 40
(1926):167.
4. DeWitt, B. S. "Quantum Theory of Gravity. I. The Canonical Theory." Phys.
Rev. 160 (1967):1113.
5. Einstein, A., and Ritz, W. "Zum gegenwartigen Stand des Strahlungsprob-
lems." Phys. Z. 10 (1909):323.
6. Halliwell, J. J., and S. W. Hawking. "Origin of Structure in the Universe."
Phys. Rev. D31 (1985):1777.
7. Joos, E. "Continuous Measurement: Watchdog Effect versus Golden Rule."
Phys. Rev. D29 (1984):1626.
8. Joos, E., and H. D. Zeh, "The Emergence of Classical Properties through In-
teraction with the Environment." Z. Phys. B59 (1985):223.
9. Kiefer, C. "Continuous Measurement of Mini-Superspace Variables by Higher
Multipoles." Class. Qu. Gravity 4 (1987):1369
10. Kiefer, C. "Wave Packets in Mini-Superspace." Phys. Rev. D38 (1988):1761.
11. Lubkin, E. "Keeping the Entropy of Measurement: Szilard Revisited." Intern.
J. Theor. Phys. 26 (1987):523.
12. Misra, B., and B. C. G. Sudarshan. "The Zeno's Paradox in Quantum The-
ory." J. Math. Phys. 18 (1977):756.
13. Padmanabhan, T. "Decoherence in the Density Matrix Describing Quantum
Three-Geometries and the Emergence of Classical Spacetime." Phys. Rev.
D39 (1989):2924. See also J. J. Halliwell's contribution to this conference.
14. Penrose, R. "Singularities and Time-Asymmetry." In General Relativity,
edited by S. W. Hawking and W. Israel. Cambridge: Cambridge University
Press, 1979.
15. Penrose, R. "Time Asymmetry and Quantum Gravity." In Quantum Gravity
2, edited by C. J. Isham, R. Penrose and D. W. Sciama. Oxford: Clarendon
Press, 1981.
16. Zeh, H. D. "On the Irreversibility of Time and Observation in Quantum The-
ory." In Enrico Fermi School of Physics IL, edited by B. d'Espagnat. New
York: Academic Press, 1971.
17. Zeh, H. D. "Einstein Nonlocality, Space-Time Structure, and Thermodynam-
ics." In Old and New Questions in Physics, Cosmology, Philosophy, and The-
oretical Biology, edited by A. van der Merwe. New York: Plenum, 1983.
18. Zeh, H. D. "Time in Quantum Gravity." Phys. Lett. A126 (1988):311.
19. Zeh, H. D. The Physical Basis of the Direction of Time. Heidelberg:
Springer, 1989.
422 H. D. Zeh
I. QUANTUM COSMOLOGY
If quantum mechanics is the underlying framework of the laws of physics, then
there must be a description of the universe as a whole and everything in it in
quantum-mechanical terms. In such a description, three forms of information are
needed to make predictions about the universe. These are the action function of the
elementary particles, the initial quantum state of the universe, and, since quantum
mechanics is an inherently probabilistic theory, the information available about
our specific history. These are sufficient for every prediction in science, and there
are no predictions that do not, at a fundamental level, involve all three forms of
information.
A unified theory of the dynamics of the basic fields has long been a goal of
elementary particle physics and may now be within reach. The equally fundamental,
equally necessary search for a theory of the initial state of the universe is the
objective of the discipline of quantum cosmology. These may even be related goals;
a single action function may describe both the Hamiltonian and the initial state.111
There has recently been much promising progress in the search for a theory
of the quantum initial condition of the universe [21 Such diverse observations as
the large-scale homogeneity and isotropy of the universe, its approximate spatial
flatness, the spectrum of density fluctuations from which the galaxies grew, the
thermodynamic arrow of time, and the existence of classical spacetime may find a
unified, compressed explanation in a particular simple law of the initial condition.
The regularities exploited by the environmental sciences such as astronomy,
geology, and biology must ultimately be traceable to the simplicity of the initial
I11As in the "no boundary" and the "tunneling from nothing proposals" where the wave function
of the universe is constructed from the action by a Eudidean functional integral in the first case
or by boundary conditions on the implied Wheeler-DeWitt equation in the second. See, e.g., Refs.
27 and 53.
[21For recent reviews see, e.g., J. J. Halliwell23 and J. B. Hartle3°,33 For a bibliography of papers
on quantum cosmology, see J. J. Halliwell.24
Quantum Mechanics in the Light of Quantum Cosmology 427
condition. Those regularities concern specific individual objects and not just re-
producible situations involving identical particles, atoms, etc. The fact that the
discovery of a bird in the forest or a fossil in a cliff or a coin in a ruin implies the
likelihood of discovering another similar bird or fossil or coin cannot be derivable
from the laws of elementary particle physics alone; it must involve correlations that
stem from the initial condition.
The environmental sciences are not only strongly affected by the initial con-
ditions but are also heavily dependent on the outcomes of quantum-probabilistic
events during the history of the universe. The statistical results of, say, proton-
proton scattering in the laboratory are much less dependent on such outcomes.
However, during the last few years there has been increasing speculation that,
even in a unified fundamental theory, free of dimensionless parameters, some of
the observable characteristics of the elementary particle system may be quantum-
probabilistic, with a probability distribution that can depend on the initial condi-
tion.13]
It is not our purpose in this article to review all these developments in quantum
cosmology. Rather, we will discuss the implications of quantum cosmology for one
of the subjects of this conference—the interpretation of quantum mechanics.
II. PROBABILITY
Even apart from quantum mechanics, there is no certainty in this world; therefore
physics deals in probabilities. In classical physics probabilities result from ignorance;
in quantum mechanics they are fundamental as well. In the last analysis, even when
treating ensembles statistically, we are concerned with the probabilities of particular
events. We then deal the probabilities of deviations from the expected behavior of
the ensemble caused by fluctuations.
When the probabilities of particular events are sufficiently close to 0 or 1, we
make a definite prediction. The criterion for "sufficiently close to 0 or 1" depends on
the use to which the probabilities are put. Consider, for example, a prediction on the
basis of present astronomical observations that the sun will come up tomorrow at
5:59 AM f 1 min. Of course, there is no certainty that the sun will come up at this
time. There might have been a significant error in the astronomical observations or
the subsequent calculations using them; there might be a non-classical fluctuation in
the earth's rotation rate or there might be a collision with a neutron star now racing
across the galaxy at near light speed. The prediction is the same as estimating the
probabilities of these alternatives as low. How low do they have to be before one
sleeps peacefully tonight rather than anxiously awaiting the dawn? The probabilities
(31As, for example, in recent discussions of the value of the cosmological constant see, e.g., S. W.
Hawking,35 S. Coleman,4 and S. Giddings and A. Strominger.20
428 Murray Gell-Mann and James B. Hartle
predicted by the laws of physics and the statistics of errors are generally agreed to
be low enough!
All predictions in science are, most honestly and most generally, the probabilis-
tic predictions of the time histories of particular events in the universe. In cosmology
we are necessarily concerned with probabilities for the single system that is the uni-
verse as a whole. Where the universe presents us effectively with an ensemble of
identical subsystems, as in experimental situations common in physics and chem-
istry, the probabilities for the ensemble as a whole yield definite predictions for the
statistics of identical observations. Thus, statistical probabilities can be derived, in
appropriate situations, from probabilities for the universe as a whole.13,26,21,11
Probabilities for histories need be assigned by physical theory only to the ac-
curacy to which they are used. Thus, it is the same to us for all practical purposes
whether physics claims the probability of the sun not coming up tomorrow is 10"1°57
or 10'1° , as long as it is very small. We can therefore conveniently consider ap-
proximate probabilities, which need obey the rules of the probability calculus only
up to some standard of accuracy sufficient for all practical purposes. In quantum
mechanics, as we shall see, it is likely that only by this means can probabilities be
assigned to interesting histories at all.
because
Ith(Y) + 1Pu(Y)I2 Ith(Y)12 + 11,111(012 (2)
If we have measured which slit the electron went through, then the interference is
destroyed, the sum rule obeyed, and we can meaningfully assign probabilities to
these alternative histories.
It is a general feature of quantum mechanics that one needs a rule to determine
which histories can be assigned probabilities. The familiar rule of the "Copenhagen"
interpretations described above is external to the framework of wave function and
Quantum Mechanics in the Light of Quantum Cosmology 429
FIGURE 1 The two-slit experiment. An electron gun at right emits an electron traveling
towards a screen with two slits, its progress in space recapitulating its evolution in time.
When precise detections are made of an ensemble of such electrons at the screen it
is not possible, because of interference, to assign a probability to the alternatives of
whether an individual electron went through the upper slit or the lower slit. However, if
the electron interacts with apparatus which measures which slit it passed through, then
these alternatives decohere and probabilities can be assigned.
141See the essays "The Unity of Knowledge" and "Atoms and Human Knowledge," reprinted in
N. Bohr.2
151For clear statements of this point of view see F. London and E. Bauer," and R. B. Peierls."
430 Murray Gell-Mann and James B. Hartle
the early universe when neither existed. There is no reason in general for a classical
domain to be fundamental or external in a basic formulation of quantum mechanics.
It was Everett who in 1957 first suggested how to generalize the Copenhagen
framework so as to apply quantum mechanics to cosmology.16) His idea was to take
quantum mechanics seriously and apply it to the universe as a whole. He showed
how an observer could be considered part of this system and how its activities—
measuring, recording, and calculating probabilities—could be described in quantum
mechanics.
Yet the Everett analysis was not complete. It did not adequately explain the
origin of the classical domain or the meaning of the "branching" that replaced the
notion of measurement. It was a theory of "many worlds" (what we would rather
call "many histories"), but it did not sufficiently explain how these were defined
or how they arose. Also, Everett's discussion suggests that a probability formula is
somehow not needed in quantum mechanics, even though a "measure" is introduced
that, in the end, amounts to the same thing.
Here we shall briefly sketch a program aiming at a coherent formulation of
quantum mechanics for science as a whole, including cosmology as well as the en-
vironmental sciences[71 It is an attempt at extension, clarification, and completion
of the Everett interpretation. It builds on many aspects of the post-Everett de-
velopments, especially the work of Zeh,56 Zurek,58,59 and Joos and Zeh.37 In the
discussion of history and at other points it is consistent with the insightful work (in-
dependent of ours) of Griffiths22 and Omnes.46,47,48 Our research is not complete,
but we sketch, in this report on its status, how it might become so.
16IThe original paper is by Everett10 . The idea was developed by many, among them Wheeler",
DeWitt?, Geroch19 , and Mukhanov" and independently arrived at by others, e.g., Gell-Mannl°
and Cooper and VanVechtexi.5 There is a useful collection of early papers on the subject in Ref. 8.
[7]Sorne elements of which have been reported earlier. See M. Gell-Mann.17
Quantum Mechanics in the Light of Quantum Cosmology 431
its apparatus of Hilbert space, states, Hamiltonian, and other operators. We shall
indicate the equivalence between the two, always possible in this approximation.
The approximation of a fixed background spacetime breaks down in the early
universe. There, a yet more fundamental sum-over histories framework of quantum
mechanics may be necessary.P31 In such a framework the notions of state, operators,
and Hamiltonian may be approximate features appropriate to the universe after
the Planck era, for particular initial conditions that imply an approximately fixed
background spacetime there. A discussion of quantum spacetime is essential for
any detailed theory of the initial condition, but when, as here, this condition is not
spelled out in detail and we are treating events after the Planck era, the familiar
formulation of quantum mechanics is an adequate approximation.
The interpretation of quantum mechanics that we shall describe in connection
with cosmology can, of course, also apply to any strictly closed sub-system of the
universe provided its initial density matrix is known. However, strictly closed sub-
systems of any size are not easily realized in the universe. Even slight interactions,
such as those of a planet with the cosmic background radiation, can be important
for the quantum mechanics of a system, as we shall see. Further, it would be ex-
traordinarily difficult to prepare precisely the initial density matrix of any sizeable
system so as to get rid of the dependence on the density matrix of the universe. In
fact, even those large systems that are approximately isolated today inherit many
important features of their effective density matrix from the initial condition of the
universe.
(B) HISTORIES
The three forms of information necessary for prediction in quantum cosmology
are represented in the Heisenberg picture as followsM: The quantum state of the
universe is described by a density matrix p. Observables describing specific infor-
mation are represented by operators 0(t). For simplicity, but without loss of gen-
erality, we shall focus on non-"fuzzy", "yes-no" observables. These are represented
in the Heisenberg picture by projection operators P(t). The Hamiltonian, which
is the remaining form of information, describes evolution by relating the operators
corresponding to the same question at different times through
labels the set, a the particular alternative, and t its time. A exhaustive set of
exclusive alternatives satisfies
For example, one such exhaustive set would specify whether a field at a point on a
surface of constant t is in one or another of a set of ranges exhausting all possible
values. The projections are simply the projections onto eigenstates of the field at
that point with values in these ranges. We should emphasize that an exhaustive set
of projections need not involve a complete set of variables for the universe (one-
dimensional projections)—in fact, the projections we deal with as observers of the
universe typically involve only an infinitesimal fraction of a complete set.
Sets of alternative histories consist of time sequences of exhaustive sets of al-
ternatives. A history is a particular sequence of alternatives, abbreviated [Pa] =
(P11 (t1), Pg,2 (t2), • • • , Pan.(tn)). A completely fine-grained history is specified by giv-
ing the values of a complete set of operators at all times. One history is a coarse
graining of another if the set [Pa] of the first history consists of sums of the [Pa]
of the second history. The inverse relation is fine graining. The completely coarse-
grained history is one with no projections whatever, just the unit operator!
The reciprocal relationships of coarse and fine graining evidently constitute
only a partial ordering of sets of alternative histories. Two arbitrary sets need not
be related to each other by coarse/fine graining. The partial ordering is represented
schematically in Figure 2, where each point stands for a set of alternative histories.
Feynman's sum-over-histories formulation of quantum mechanics begins by
specifying the amplitude for a completely fine-grained history in a particular basis
of generalized coordinates Ql(t), say all fundamental field variables at all points in
space. This amplitude is proportional to
exP(iS[Qi(i)]/h), (5)
where S is the action functional that yields the Hamiltonian, H. When we employ
this formulation of quantum mechanics, we shall introduce the simplification of
ignoring fields with spins higher than zero, so as to avoid the complications of gauge
groups and of fermion fields (for which it is inappropriate to discuss eigenstates of
the field variables.) The operators Q4(t) are thus various scalar fields at different
points of space.
Let us now specialize our discussion of histories to the generalized coordinate
bases Cr (t) of the Feynman approach. Later we shall discuss the necessary general-
ization to the case of an arbitrary basis at each time t, utilizing quantum-mechanical
tranformation theory.
Completely fine-grained histories in the coordinate basis cannot be assigned
probabilities; only suitable coarse-grained histories can. There are at least three
common types of coarse graining: (1) specifying observables not at all times, but
Quantum Mechanics in the Light of Quantum Cosmology 433
6p
FIGURE 2 The schematic structure of the space of sets of possible histories for the
universe. Each dot in this diagram represents an exhaustive set of alternative histories.
Such sets, denoted by UN) in the text, correspond in the Heisenberg picture to time
sequences (Pl1 (t1), P3,(i2),• • • , (in )) of sets of projection operators, such that at
each time t k the alternatives ak are an orthogonal and exhaustive set of possibilities
for the universe. At the bottom of the diagram are the completely fine-grained sets
of histories, each arising from taking projections onto eigenstates of a complete set
of observables for the universe at every time. For example, the set Q is the set in
which all field variables at all points of space are specified at every time. P might be
the completely fine-grained set in which all field momenta are specified at each time.
1) might be a degenerate set of the kind discussed in Section VII in which the same
complete set of operators occurs at every time. But there are many other completely
fine-grained sets of histories corresponding to all possible combinations of complete
sets of observables that can be taken at every time.
The dots above the bottom row are coarse-grained sets of alternative histories.
If two dots are connected by a path, the one above is a coarse graining of the one
below—that is, the projections in the set above are sums of those in the set below.
At the very top is the degenerate case in which complete sums are taken at every
time, yielding no projections at all other than the unit operator! The space of sets of
alternative histories is thus partially ordered by the operation of coarse graining.
The heavy dots denote the decoherent sets of alternative histories. Coarse
grainings of decoherent sets remain decoherent. Maximal sets, the heavy dots
surrounded by circles, are those decohering sets for which there is no finer-grained
decoherent set.
434 Murray GeII-Mann and James B. Hartle
only at some times: (2) specifying at any one time not a complete set of observables,
but only some of them: (3) specifying for these observables not precise values, but
only ranges of values. To illustrate all three, let us divide the Q' up into variables
z2 and Xs and consider only sets of ranges {AD of e at times t k , k = 1, • • n.
A set of alternatives at any one time consists of ranges Aka , which exhaust the
possible values of e as a ranges over all integers. An individual history is specified
by particular An's at the times t1 , • • • , tn . We write [la] = , • • • , Wan ) for a
particular history. A set of alternative histories is obtained by letting al • - • an
range over all values.
Let us use the same notation [A„] for the most general history that is a coarse
graining of the completely fine-grained history in the coordinate basis, specified by
ranges of the Q. at each time, including the possibility of full ranges at certain
times, which eliminate those times from consideration.
DEQ'i ( t), (t)] = b(QI — 0) exp{i (t)] — S[Qi (t)]) 1 h} p(Q1 , Qio ) . (6)
SQ' 8
D ([Aa'], [Acr1) = bQ (Q1 — e1{(5[421-5"14} P(Q1Oi Qio) • (7 )
J
[As'] IAA
More precisely, the integral is as follows (Figure 3): It is over all histories Q'i(t),
Q' (t) that begin at Qg ,Qi0 respectively, pass through the ranges [Day] and [6,„]
respectively, and wind up at a common point 0 at any time t f > tn . It is completed
by integrating over Q, Qio , and Q .
The connection between coarse-grained histories and completely fine-grained
ones is transparent in the sum-over-histories formulation of quantum mechanics.
Quantum Mechanics in the Light of Quantum Cosmology 435
The projections in Eq. (8) are time ordered with the earliest on the inside. When
the P's are projections onto ranges At of values of the Q's, expressions (7) and (8)
agree. From the cyclic property of the trace it follows that D is always diagonal
in the final indices an and an. (We assume throughout that the P's are bounded
operators in Hilbert space dealing, for example, with projections onto ranges of the
Q's and not onto definite values of the Q's). Decoherence is thus an interesting
notion only for strings of P's that involve more than one time. Decoherence is
automatic for "histories" that consist of alternatives at but one time.
Progressive coarse graining may be seen in the sum-over-histories picture as
summing over those parts of the fine-grained histories not specified in the coarse-
grained one, according to the principle of superposition. In the Heisenberg picture,
Eq. (8), the three common forms of coarse graining discussed above can be repre-
sented as follows: Summing on both sides of D over all P's at a given time and
using Eq. (3) eliminates those P's completely. Summing over all possibilities for
certain variables at one time amounts to factoring the P's and eliminating one of
the factors by summing over it. Summing over ranges of values of a given variable
at a given time corresponds to replacing the P's for the partial ranges by one for
436 Murray Gell-Mann and James B. Hartle
the total range. Thus, if [Pp] is a coarse graining of the set of histories {[Pa]}, we
write
In the most general case, we may think of the completely fine-grained limit as
obtained from the coordinate representation by arbitrary unitary transformations
at all times. All histories can be obtained by coarse-graining the various completely
fine-grained ones, and coarse graining in its most general form involves taking ar-
bitrary sums of P's, as discussed earlier. We may use Eq. (9) in the most general
case where [Pp] is a coarse graining of [Pa].
A set of coarse-grained alternative histories is said to decohere when the off-
diagonal elements of D are sufficiently small:
This is a generalization of the condition for the absence of interference in the two-
slit experiment (approximate equality of the two sides of Eq. (2)). It is a sufficient
(although not a necessary) condition for the validity of the purely diagonal formula
D ([ 75] , [75])
all
E
Pa, aot
D ( EP"[Pal ) •
fixed by 1 P0
The rule for when probabilities can be assigned to histories of the universe is
then this: To the extent that a set of alternative histories decoheres, probabili-
ties can be assigned to its individual members. The probabilities are the diagonal
elements of D. Thus,
PaPap = DaPab[Pap
= Tr [P:.(2„) • • • • Pc7„(in)] (12)
when the set decoheres. We will frequently write p(antn , • • • , al ti ) for these prob-
abilities, suppressing the labels of the sets.
The probabilities defined by Eq. (12) obey the rules of probability theory as a
consequence of decoherence. The principal requirement is that the probabilities be
additive on "disjoint sets of the sample space". For histories this gives the sum rule
([P-1
3]) E
all Pa, bet
P ([PaD (13)
fixed by (P
aI
Quantum Mechanics in the Light of Quantum Cosmology 437
These relate the probabilities for a set of histories to the probabilities for all coarser
grained sets that can be constructed from it. For example, the sum rule eliminating
all projections at only one time is
E ak
P(anin • • • , ak+lik+1/ akik, ak—iik-1, • • • , alit)
Pt- kanin,• • • , ak+iik+i, • • • , aiii) • (14)
These rules follow trivially from Eqs. (11) and (12). The other requirements from
probability theory are that the probability of the whole sample space be unity, an
easy consequence of Eq. (11) when complete coarse graining is performed, and that
the probability for an empty set be zero, which means simply that the probability
of any sequence containing a projection P = 0 must vanish, as it does.
The p([Pa]) are approximate probabilities for histories, in the sense of Section
II, up to the standard set by decoherence. Conversely, if a given standard for the
probabilities is required by their use, it can be met by coarse graining until Eqs.
(10) and (13) are satisfied at the requisite level.
Further coarse graining of a decoherent set of alternative histories produces
another set of decoherent histories since the probability sum rules continue to be
obeyed. That is illustrated in Figure 2, which makes it clear that in a progression
from the trivial completely coarse graining to a completely fine graining, there are
sets of histories where further fine graining always results in loss of decoherence.
These are the maximal sets of alternative decohering histories.
These rules for probability exhibit another important feature: The operators
in Eq. (12) are time-ordered. Were they not time-ordered (zig-zags) we could have
assigned non-zero probabilities to conflicting alternatives at the same time. The
time ordering thus expresses causality in quantum mechanics, a notion that is ap-
propriate here because of the approximation of fixed background spacetime. The
time ordering is related as well to the "arrow of time" in quantum mechanics, which
we discuss below.
Given this discussion, the fundamental formula of quantum mechanics may be
reasonably taken to be
for all [P.] in a set of alternative histories. Vanishing of the off-diagonal elements of
D gives the rule for when probabilities may be consistently assigned. The diagonal
elements give their values.
We could have used a weaker condition than Eq. (10) as the definition of de-
coherence, namely the necessary condition for the validity of the sum rules (11) of
probability theory:
D ([Pal, [Pa']) + D ([Pal, [Pal) 0 (16)
for any a'k # ak, or equivalently
This is the condition used by Griffiths22 as the requirement for "consistent histo-
ries". However, while, as we shall see, it is easy to identify physical situations in
which the off-diagonal elements of D approximately vanish as the result of coarse
graining, it is hard to think of a general mechanism that suppresses only their real
parts. In the usual analysis of measurement, the off-diagonal parts of D approx-
imately vanish. We shall, therefore, explore the stronger condition of Eq. (10) in
what follows. That difference should not obscure the fact that in this part of our
work we have reproduced what is essentially the approach of Griffiths,22 extended
by Omnes.46'47,48
For example, the probability for predicting alternatives ak+i , • • • , an, given that the
alternatives ai • • • ak have already happened, is
al
kantn, ' • • , ak+Itk+Ilakik, • • • , alt1) = 2:7(cvnin' • • t') (19)
p(akt k ,• • • ,ai t i ) .
The probability that an-i , • • • , al happened in the past, given present data sum-
marized by an alternative an at the present time in , is
P(Ctlan • • • al ti )
Plan-itn-1 , • • • , aiti 'cyan) (20)
p(antn )
Decoherence ensures that the probabilities defined by Eqs. (18)-(20) will approxi-
mately add to unity when summed over all remaining alternatives, because of Eq.
(14).
Quantum Mechanics in the Light of Quantum Cosmology 439
Despite the similarity between Eqs. (19) and (20), there are differences between
prediction and retrodiction. Future predictions can all be obtained from an effective
density matrix summarizing information about what has happened. If peff is defined
by
Pt(tk)- • • PcitiMPP1,(t1) • • • Pt (t k ) (21)
Peff Tr[Pet (4) • • • P2-,i (ti)pP2i i (ti) • • • Pt (tk)]
then
tutt has been suggested28.29,31,32 that, for application to highly quantum-mechanical spacetime,
as in the very early universe, quantum mechanics should be generalized to yield a framework in
which both time orderings are treated simultaneously in the sum-over-histories approach. This
involves including both exp(iS) and exp(—iS) for each history and has as a consequence an
evolution equation (the Wheeler-DeWitt equation) that is second order in the time variable. The
suggestion is that the two time orderings decohere when the universe is large and spacetime
classical, so that the usual framework with just one ordering is recovered.
440 Murray GeII-Mann and James B. Hartle
simple condition in what we call "the past". For example, the indicated present
homogeneity of the thermodynamic arrow of time can be traced to the near homo-
geneity of the "early" universe implied by p and the implication that the progenitors
of approximately isolated subsystems started out far from equilibrium at "early"
times.
Much has been made of the updating of the fundamental probability formula in
Eq. (19) and in Eqs. (21) and (22). By utilizing Eq. (21) the process of prediction
may be organized so that for each time there is a pj from which probabilities
for the future may be calculated. The action of each projection, P, on both sides
of p in Eq. (21) along with the division by the appropriate normalizing factor is
then sometimes called the "reduction of the wave packet". But this updating of
probabilities is no different from the classical reassessment of probabilities that
occurs after new information is obtained. In a sequence of horse races, the joint
probability for the winners of eight races is converted, after the winners of the
first three are known, into a reassessed probability for the remaining five races by
exactly this process. The main thing is that, because of decoherence, the sum rules
for probabilities are obeyed; once that is true, reassessment of probabilities is trivial.
The only non-trivial aspect of the situation is the choice of the string of P's in
Eq. (8) giving a decoherent set of histories.
P= (23)
The initial state may be decomposed according to the projection operators that
define the set of alternative histories
(24)
(Vi•••42.
Pin(tn ) for the last time to in Eq. (8) were all projections onto pure states, D would
factor for a pure p and could never satisfy Eq. (10), except for certain special kinds of
histories described near the end of Section VII, in which decoherence is automatic,
independent of p. Similarly, it is not difficult to show that some coarse graining is
required at any time in order to have decoherence of previous alternatives, with the
same set of exceptions.
After normalization, the states 1[P0], IF) represent the individual histories or
individual branches in the decohering set. We may, as for the effective density
matrix of IV(D), summarize present information for prediction just by giving one
of these states, with projections up to the present.
fit } will have an identical decoherence functional to the sets constructed from the
corresponding {PD. If one set decoheres, the other will, and the probabilities for
the individual histories will be the same.
In a similar way, decoherence and probabilities are invariant under arbitrary
reassignments of the times in a string of P's (as long as they continue to be ordered),
with the projection operators at the altered times unchanged as operators in Hilbert
space. This is because in the Heisenberg picture every projection is at any time a
projection operator for some quantity.
The histories arising from constant unitary transformations or from reassign-
ment of times of a given set of P's will, in general, have very different descriptions
in terms of fundamental fields from that of the original set. We are considering
transformations such as Eq. (27) in an active (or alibi) sense so that the field op-
erators and Hamiltonian are unchanged. (The passive (or alias) transformations,
in which these are transformed, are easily understood.) A set of projections onto
the ranges of field values in a spatial region is generally transformed by Eq. (27) or
by any reassignment of the times into an extraordinarily complicated combination
of all fields and all momenta at all positions in the universe! Histories consisting
of projections onto values of similar quantities at different times can thus become
histories of very different quantities at various other times.
In ordinary presentations of quantum mechanics, two histories with different
descriptions can correspond to physically distinct situations because it is presumed
that various different Hermitian combinations of field operators are potentially mea-
surable by different kinds of external apparatus. In quantum cosmology, however,
apparatus and system are considered together and the notion of physically distinct
situations may have a different character.
commuting projection operators, and the factors of P's for different times often fail
to commute with one another, for example, factors that are projections onto related
ranges of values of the same Heisenberg operator at different times. However, these
non-commuting factors may be correlated, given p, with other projection factors
that do commute or, at least, effectively commute inside the trace with the density
matrix p in Eq. (8) for the decoherence functional. In fact, these other projection
factors may commute with all the subsequent P's and thus allow themselves to be
moved to the outside of the trace formula. When all the non-commuting factors are
correlated in this manner with effectively commuting ones, then the off-diagonal
terms in the decoherence functional vanish, in other words, decoherence results. Of
course, all this behavior may be approximate, resulting in approximate decoherence.
This type of situation is fundamental in the interpretation of quantum me-
chanics. Non-commuting quantities, say at different times, may be correlated with
commuting or effectively commuting quantities because of the character of p and
H, and thus produce decoherence of strings of P's despite their non-commutation.
For a pure p, for example, the behavior of the effectively commuting variables leads
to the orthogonality of the branches of the state IIP), as defined in Eq. (24). We
shall see that correlations of this character are central to understanding historical
records (Section X) and measurement situations (Section XI).
As an example of decoherence produced by this mechanism, consider a coarse-
grained set of histories defined by time sequences of alternative approximate local-
izations of a massive body such as a planet or even a typical interstellar dust grain.
As shown by Joos and Zeh,37 even if the successive localizations are spaced as closely
as a nanosecond, such histories decohere as a consequence of scattering by the 3°
cosmic background radiation (if for no other reason). Different positions become
correlated with nearly orthogonal states of the photons. More importantly, each al-
ternative sequence of positions becomes correlated with a different orthogonal state
of the photons at the final time. This accomplishes the decoherence and we may
loosely say that such histories of the position of a massive body are "decohered"
by interaction with the photons of the background radiation.
Other specific models of decoherence have been discussed by many authors,
among them Joos and Zeh,37 Caldeira and Leggett,3 and Zurek." Typically these
discussions have focussed on a coarse graining that involves only certain variables
analogous to the position variables above. Thus the emphasis is on particular non-
commuting factors of the projection operators and not on correlated operators that
may be accomplishing the approximate decoherence. Such coarse grainings do not,
in general, yield the most refined approximately decohering sets of histories, since
one could include projections onto ranges of values of the correlated operators
without losing the decoherence.
The simplest model consists of a single oscillator interacting bilinearly with a
large number of others, and a coarse graining which involves only the coordinates of
the special oscillator. Let x be the coordinate of the special oscillator, M its mass,
WR its frequency renormalized by its interactions with the others, and Sfree its free
action. Consider the special case where the density matrix of the whole system,
referred to an initial time, factors into the product of a density matrix p(x', x) of
444 Murray Gall-Mann and James B. Hartle
the distinguished oscillator and another for the rest. Then, generalizing slightly a
treatment of Feynman and Vernon,12 we can write D defined by Eq. (7) as
the intervals [0,,] referring here only to the variables of the distinguished oscillator.
The sum over the rest of the oscillators has been carried out and is summarized by
the Feynman-Vernon influence functional exp(iW [x' (t), x(t)]). The remaining sum
over x'(t) and x(t) is as in Eq. (7).
The case when the other oscillators are in an initial thermal distribution has
been extensively investigated by Caldeira and Leggett.3 In the simple limit of a
uniform continuum of oscillators cut off at frequency 11 and in the Fokker-Planck
limit of kT >> hfi >> tuaR, they find
What the above models convincingly show is that decoherence will be wide-
spread in the universe for certain familiar "classical" variables. The answer to
Fermi's question to one of us of why we don't see Mars spread out in a quan-
tum superposition of different positions in its orbit is that such a superposition
would rapidly decohere. We now proceed to a more detailed discussion of such
decoherence.
capacity of any set of observers—they can cover phenomena in all parts of the
universe and at all epochs that could be observed, whether or not any observer was
present. Maximal sets are the most refined descriptions of the universe that may
be assigned probabilities in quantum mechanics.
The class of maximal sets possible for the universe depends, of course, on the
completely fine-grained histories that are presented by the actual quantum theory
of the universe. If we utilize to the full, at each moment of time, all the projections
permitted by transformation theory, which gives quantum mechanics its protean
character, then there is an infinite variety of completely fine-grained sets, as illus-
trated in Figure 2. However, were there some fundamental reason to restrict the
completely fine grained sets, as would be the case if sum-over-histories quantum
mechanics were fundamental, then the class of maximal sets would be smaller as
illustrated in Figure 4. We shall proceed as if all fine grainings are allowed.
If a full correlation exists between a projection in a coarse graining and an-
other projection not included, then the finer graining including both still defines
a decoherent set of histories. In a maximal set of decoherent histories, both corre-
lated projections must be included if either one is included. Thus, in the mechanism
has the same value it would have had when computed with the density matrix of
the universe, p, for a given set of coarse-grained histories. The density matrix I) thus
reproduces the decoherence functional for this set of histories, and in particular their
probabilities, but possesses as little information as possible beyond those properties.
A fine graining of a set of alternative histories leads to more conditions on p of
the form (32) than in the coarser-grained set. In nontrivial cases SOO is, therefore,
lowered and 5 becomes closer to p.
If the insertion of apparently new P's into a chain is redundant, then S(p) will
not be lowered. A simple example will help to illustrate this: Consider the set of
histories consisting of projections PZ(t,n) which project onto an orthonormal basis
for Hilbert space at one time, in,. Trivial further decoherent fine grainings can be
constructed as follows: At each other time tk introduce a set of projections It, (ti)
that, through the equations of motion, are identical operators in Hilbert space to
the set PZ(t,n). In this way, even though we are going through the motions of
introducing a completely fine-grained set of histories covering all the times, we are
really just repeating the projections PZ(tm) over and over again. We thus have a
completely fine-grained set of histories that, in fact, consists of just one fine-grained
set of projections and decoheres exactly because there is only one such set. Indeed,
in terms of S(p) it is no closer to maximality than the set consisting of P (t,„) at
one time. The quantity SUS) thus serves to identify such trivial refinements, which
amount to redundancy in the conditions (32).
We can generalize the example in an interesting way by constructing the special
kinds of histories mentioned after Eq. (25). We take tm to be the final time and then
adjoin, at earlier and earlier times, a succession of progressive coarse grainings of
the set {P (t,,,)}. Thus, as time moves forward, the only projections are finer and
finer grainings terminating in the one-dimensional PZ, (tm). We thus have again a
set of histories in which decoherence is automatic, independent of the character of
p, and for which S(A) has the same value it would have had if only conditions at
the final time had been considered.
In a certain sense, S(p) for histories can be regarded as decreasing with time.
If we consider S(p) for a string of alternative projections up to a certain time tn ,
as in Eq. (32), and then adjoin an additional set of projections for a later time, the
number of conditions on A is increased and thus the value of S(//) is decreased (or,
in trivial cases, unchanged). That is natural, since S(15) is connected with the lack
of information contained in a set of histories and that information increases with
non-trivial fine graining of the histories, no matter what the times for which the
new P's are introduced. (In some related problems, a quantity like S that keeps
decreasing as a result of adjoining projections at later times can be converted into
an increasing quantity by adding an algorithmic complexity term.61)
The quantity S(P) is closely related to other fundamental quantities in physics.
One can show, for example, that when used with the per representing present data
and with alternatives at a single time, these techniques give a unified and generalized
treatment of the variety of coarse grainings commonly introduced in statistical
mechanics; and, as Jaynes and others have pointed out, the resulting 50*s are
the physical entropies of statistical mechanics. Here, however, these techniques are
applied to time histories and the initial condition is utilized. The quantity SCP) is
also related to the notion of thermodynamic depth currently being investigated by
Lloyd.43
VIII. CLASSICITY
Some maximal sets will be more nearly classical than others. The more nearly
classical sets of histories will contain projections (onto related ranges of values) of
operators, for different times, that are connected to one another by the unitary
transformations e-111(4") and that are correlated for the most part along classical
paths, with probabilities near zero and one for the successive projections. This
pattern of classical correlation may be disturbed by the inclusion, in the maximal
set of projection operators, of other variables, which do not behave in this way
(as in measurement situations to be described later). The pattern may also be
Quantum Mechanics in the Light of Quantum Cosmology 449
Pi See, e.g., E. Joos,38 H. Zell," C. Keifer,so J. Haniwen,25 and T. Fukuyama and M. Morilcawa.18
450 • Murray GeII-Mann and James B. Hartle
species. The sizes of the volumes are limited above by maximality and are limited
below by classicity because they require sufficient "inertia" to enable them to resist
deviations from predictability caused by their interactions with one another, by
quantum spreading, and by the quantum and statistical fluctuations resulting from
interactions with the rest of the universe. Suitable integrals of densities of approx-
imately conserved quantities are thus candidates for habitually decohering quasi-
classical operators. Field theory is local, and it is an interesting question whether
that locality somehow picks out local densities as the source of habitually decoher-
ing quantities. It is hardly necessary to note that such hydrodynamic variables are
among the principal variables of classical physics 1131
In the case of densities of conserved quantities, the integrals would not change
at all if the volumes were infinite. For smaller volumes we expect approximate per-
sistence. When, as in hydrodynamics, the rates of change of the integrals form part
of an approximately closed system of equations of motion, the resulting evolution
is just as classical as in the case of persistence.
X. BRANCH DEPENDENCE
As the discussion in Sections V and IX shows, physically interesting mechanisms
for decoherence will operate differently in different alternative histories for the uni-
verse. For example, hydrodynamic variables defined by a relatively small set of
volumes may decohere at certain locations in spacetime in those branches where a
gravitationally condensed body (e.g., the earth) actually exists, and may not deco-
here in other branches where no such condensed body exists at that location. In the
latter branch there simply may be not enough "inertia" for densities defined with
too small volumes to resist deviations from predictability. Similarly, alternative spin
directions associated with Stern-Gerlach beams may decohere for those branches
on which a photographic plate detects their beams and not in a branch where they
recombine coherently instead. There are no variables that are expected to decohere
universally. Even the mechanisms causing spacetime geometry at a given location
to decohere on scales far above the Planck length cannot necessarily be expected
to operate in the same way on a branch where the location is the center of a black
hole as on those branches where there is no black hole nearby.
How is such "branch dependence" described in the formalism we have elabo-
rated? It is not described by considering histories where the set of alternatives at
one time (the k in a set of Pc!) depends on specific alternatives (the a's) of sets
of earlier times. Such dependence would destroy the derivation of the probability
sum rules from the fundamental formula. However, there is no such obstacle to the
set of alternatives at one time depending on the sets of alternatives at all previous
[131 For discussion of how such hydrodynamic variables are distinguished in non-equilibrium sta-
tistical mechanics in not unrelated ways see, e.g., L. Kadanoff and P. Martin,39 D. Forster,14 and
J. Lebovitz.42
Quantum Mechanics in the Light of Quantum Cosmology 451
strongly correlated with the quasiclassical ones at particular times. Such operators,
not normally decohering, are, in fact, included among the decohering set only by
virtue of their correlation with a habitually decohering one. In this case we have a
measurement situation of the kind usually discussed in quantum mechanics. Sup-
pose, for example, in the inevitable Stern-Gerlach experiment, that az of a spin-1/2
particle is correlated with the orbit of an atom in an inhomogeneous magnetic field.
If the two orbits decohere because of interaction with something else (the atomic
excitations in a photographic plate for example), then the spin direction will be
included in the maximal set of decoherent histories, fully correlated with the deco-
hering orbital directions. The spin direction is thus measured.
The recovery of the Copenhagen rule for when probabilities may be assigned
is immediate. Measured quantities are correlated with decohering histories. De-
cohering histories can be assigned probabilities. Thus in the two-slit experiment
(Figure 1), when the electron interacts with an apparatus that determines which
slit it passed through, it is the decoherence of the alternative configurations of the
apparatus that enables probabilities to be assigned for the electron.
Correlation between the ranges of values of operators of a quasiclassical domain
is the only defining property of a measurement situation. Conventionally, measure-
ments have been characterized in other ways. Essential features have been seen to
be irreversibility, amplification beyond a certain level of signal-to-noise, association
with a macroscopic variable, the possibility of further association with a long chain
of such variables, and the formation of enduring records. Efforts have been made
to attach some degree of precision to words like "irreversible", "macroscopic", and
"record", and to discuss what level of "amplification" needs to be achieved 1141 While
such characterizations of measurement are difficult to define preciselyP5) some can
be seen in a rough way to be consequences of the definition that we are attempting
to introduce here, as follows:
Correlation of a variable with the quasiclassical domain (actually, inclusion in
its set of histories) accomplishes the amplification beyond noise and the association
with a macroscopic variable that can be extended to an indefinitely long chain of
such variables. The relative predictability of the classical world is a generalized form
of record. The approximate constancy of, say, a mark in a notebook is just a special
case; persistence in a classical orbit is just as good.
Irreversibility is more subtle. One measure of it is the cost (in energy, money,
etc.) of tracking down the phases specifying coherence and restoring them. This is
intuitively large in many typical measurement situations. Another, related measure
is the negative of the logarithm of the probability of doing so. If the probability of
restoring the phases in any particular measurement situation were significant, then
we would not have the necessary amount of decoherence. The correlation could
not be inside the set of decohering histories. Thus, this measure of irreversibility is
large. Indeed, in many circumstances where the phases are carried off to infinity or
lost in photons impossible to retrieve, the probability of recovering them is truly
zero and the situation perfectly irreversible—infinitely costly to reverse and with
zero probability for reversal!
Defining a measurement situation solely as the existence of correlations in a qua-
siclassical domain, if suitable general definitions of maximality and classicity can be
found, would have the advantages of clarity, economy, and generality. Measurement
situations occur throughout the universe and without the necessary intervention of
anything as sophisticated as an "observer". Thus, by this definition, the production
of fission tracks in mica deep in the earth by the decay of a uranium nucleus leads
to a measurement situation in a quasiclassical domain in which the tracks directions
decohere, whether or not these tracks are ever registered by an "observer".
coarse graining is very much coarser than that of the quasiclassical domain since it
utilizes only a few of the variables in the universe.
The reason such systems as IGUSes exist, functioning in such a fashion, is to
be sought in their evolution within the universe. It seems likely that they evolved to
make predictions because it is adaptive to do so.(161 The reason, therefore, for their
focus on decohering variables is that these are the only variables for which predic-
tions can be made. The reason for their focus on the histories of a quasiclassical
domain is that these present enough regularity over time to permit the generation
of models (schemata) with significant predictive power.
If there is essentially only one quasiclassical domain, then naturally the IGUS
utilizes further coarse grainings of it. If there are many essentially inequivalent
quasiclassical domains, then we could adopt a subjective point of view, as in some
traditional discussions of quantum mechanics, and say that the IGUS "chooses"
its coarse graining of histories and, therefore, "chooses" a particular quasiclassical
domain, or a subset of such domains, for further coarse graining. It would be better,
however, to say that the IGUS evolves to exploit a particular quasiclassical domain
or set of such domains. Then IGUSes, including human beings, occupy no special
place and play no preferred role in the laws of physics. They merely utilize the
probabilities presented by quantum mechanics in the context of a quasiclassical
domain.
XIII. CONCLUSIONS
We have sketched a program for understanding the quantum mechanics of the
universe and the quantum mechanics of the laboratory, in which the notion of
quasiclassical domain plays a central role. To carry out that program, it is important
to complete the definition of a quasiclassical domain by finding the general definition
for classicity. Once that is accomplished, the question of how many and what kinds
of essentially inequivalent quasiclassical domains follow from p and H becomes a
topic for serious theoretical research. So is the question of what are the general
properties of IGUSes that can exist in the universe exploiting various quasiclassical
domains, or the unique one if there is essentially only one.
It would be a striking and deeply important fact of the universe if, among its
maximal sets of decohering histories, there were one roughly equivalent group with
much higher classicities than all the others. That would then be the quasiclassical
domain, completely independent of any subjective criterion, and realized within
quantum mechanics by utilizing only the initial condition of the universe and the
Hamiltonian of the elementary particles.
[161 Perhaps, as W. Unruh has suggested, there are complex adaptive systems, making no use of
prediction, that can function in a highly quantum-mechanical way. If this is the case, they are
very different from anything we know or understand.
Quantum Mechanics in the Light of Quantum Cosmology 455
Whether the universe exhibits one or many maximal sets of branching alter-
native histories with high classicities, those quasiclassical domains are the possible
arenas of prediction in quantum mechanics. •
It might seem at first sight that in such a picture the complementarity of
quantum mechanics would be lost; in a given situation, for example, either a mo-
mentum or a coordinate could be measured, leading to different kinds of histories.
We believe that impression is illusory. The histories in which an observer, as part
of the universe, measures p and the histories in which that observer measures x are
decohering alternatives. The important point is that the decoherent histories of a
quasiclassical domain contain all possible choices that might be made by all possible
observers that might exist, now, in the past, or in the future for that domain.
The EPR or EPRB situation is no more mysterious. There, a choice of mea-
surements, say, as or ay for a given electron, is correlated with the behavior of
az or ay for another electron because the two together are in a singlet spin state
even though widely separated. Again, the two measurement situations (for az and
ay) decohere from each other, but here, in each, there is also a correlation between
the information obtained about one spin and the information that can be obtained
about the other. This behavior, although unfortunately called "non-local" by some
authors, involves no non-locality in the ordinary sense of quantum field theory and
no possibility of signaling outside the light cone. The problem with the "local real-
ism" that Einstein would have liked is not the locality but the realism. Quantum
mechanics describes alternative decohering histories and one cannot assign "reality"
simultaneously to different alternatives because they are contradictory. Everettl°
and others7 have described this situation, not incorrectly, but in a way that has
confused some, by saying that the histories are all "equally real" (meaning only
that quantum mechanics prefers none over another except via probabilities) and by
referring to "many worlds" instead of "many histories".
We conclude that resolution of the problems of interpretation presented by
quantum mechanics is not to be accomplished by further intense scrutiny of the
subject as it applies to reproducible laboratory situations, but rather through an
examination of the origin of the universe and its subsequent history. Quantum
mechanics is best and most fundamentally understood in the context of quantum
cosmology. The founders of quantum mechanics were right in pointing out that
something external to the framework of wave function and Schrodinger equation is
needed to interpret the theory. But it is not a postulated classical world to which
quantum mechanics does not apply. Rather it is the initial condition of the universe
that, together with the action function of the elementary particles and the throws
of quantum dice since the beginning, explains the origin of quasiclassical domain(s)
within quantum theory itself.
456 Murray GeII-Mann and James B. Hartle
ACKNOWLEDGMENTS
One of us, MG-M, would like to acknowledge the great value of conversations about
the meaning of quantum mechanics with Felix Villars and Richard Feynman in
1963-64 and again with Richard Feynman in 1987-88. He is also very grateful to
Valentine Telegdi for discussions during 1985-86, which persuaded him to take up
the subject again after twenty years. Both of us are indebted to Telegdi for further
interesting conversations since 1987. We would also like to thank R. Griffiths for a
useful communication and a critical reading of the manuscript and R. Penrose for
a helpful discussion.
Part of this work was carried out at various times at the Institute for Theoretical
Physics, Santa Barbara, the Aspen Center for Physics, the Santa Fe Institute, and
the Department of Applied Mathematics and Theoretical Physics, University of
Cambridge. We are grateful for the hospitality of these institutions. The work of
JBH was supported in part by NSF grant PHY85-06686 and by a John Simon
Guggenheim Fellowship. The work of MG-M was supported in part by the U.S.
Department of Energy under contract DE-AC-03-81ER40050 and by the Alfred P.
Sloan Foundation.
REFERENCES
For a subject as large as this one it would be an enormous task to cite the literature
in any historically complete way. We have attempted to cite only papers that we
feel will be directly useful to the points raised in the text. These are not always
the earliest nor are they always the latest. In particular we have not attempted to
review or to cite papers where similar problems are discussed from different points
of view.
1. Aharonov, Y., P. Bergmann, and J. Lebovitz. Phys. Rev. B134 (1964):1410.
2. Bohr, N. Atomic Physics and Human Knowledge. New York: John Wiley,
1958.
3. Caldeira, A. 0., and A. J. Leggett. Physica 121A (1983):587.
4. Coleman, S. NucL Phys. B310 (1988):643.
5. Cooper, L., and D. VanVechten. Am. J. Phys. 37 (1969):1212.
6. Daneri, A., A. Loinger, and G. M. Prosperi. Nucl. Phys. 33 (1962):297.
7. DeWitt, B. Physics Today 23(9) (1970).
8. DeWitt, B., and R. N. Graham. The Many Worlds Interpretation of Quantum
Mechanics. Princeton: Princeton University Press, 1973.
9. Dicke, R. H. Am. J. Phys. 49 (1981):925.
10. Everett, H. Rev. Mod. Phys. 29 (1957):454.
11. Farhi, E., J. Goldstone, and S. Gutmann. To be published.
12. Feynman, R. P., and J. R. Vernon. Ann. Phys. (N. Y.) 24 (1963):118.
Quantum Mechanics in the Light of Quantum Cosmology 457
1. INTRODUCTION
The point of this work is to discuss some recent work on the application of a body of
ideas normally used in quantum measurement theory to quantum cosmology. The
question that I will address is the following: How, in a quantum theory of gravity
as applied to closed cosmological systems, i.e., in quantum cosmology, does the
gravitational field become classical? The possible answer to this question that I will
discuss involves decoherence of the density matrix of the universe. This necessarily
involves the dissipation of information, making contact with the information theme
of this meeting. But before proceeding to quantum cosmology, we begin by dis-
cussing the emergence of classical behavior in some more down-to-earth quantum
systems.
It is one of the undeniable facts of our experience that the world about us is
described by classical laws to a very high degree of accuracy. In classical mechanics,
a system may be assigned a quite definite state and its evolution is described in a
deterministic manner—given the state of the system at a particular time, one can
predict its state at a later time with certainty. And yet, it is believed that the world
is fundamentally quantum mechanical in nature. Phemonena on all scales up to and
including the entire universe are supposedly described by quantum mechanics. In
quantum mechanics, because superpositions of interfering states are permissable,
it is generally not possible to say that a system is in a definite state. Moreover,
evolution is not deterministic but probabilistic—given the state of the system at a
particular time, one can calculate only the probability of finding it in another state
at a later time.
If quantum theory is to be reconciled with our classical experience, it is clearly
essential to understand the sense in which, and the extent to which, quantum
mechanics reproduces the effects of classical mechanics. This is an issue that as-
sumes particular importance in the quantum theory of measurement." There, one
describes the measuring apparatus in quantum mechanical terms; yet all such ap-
parata behave in a distinctly classical manner when the experimenter's eye reads
the meter.
Early universe cosmology provides another class of situations in which the
emergence of classical behavior from quantum mechanics is a process of particu-
lar interest. In the inflationary universe scenario, for example, the classical density
fluctuations required for galaxy formation supposedly originate in the quantum
fluctuations of a scalar field, hugely amplified by inflation.1°,21 This is, in a sense,
an extreme example of a quantum measurement process, in that the large-scale
structure of the universe that we see today is a meter which has permanently
recorded the quantum state of the scalar field at early times. The manner in which
this quantum to classical transition comes about has been discussed by numerous
authors.11,12,28,38,39 A more fundamental situation of interest, and the one with
Information Dissipation in Quantum Cosmology 461
In the configuration space representation, the coherent states I x„(t)) are given by
They are Gaussian wavepackets strongly peaked about the classical trajectories
zn(t). One might therefore be tempted to say that the system has become classical,
and that the particle will be following one of the trajectories z„(t) with probability
I c„ 12 . The problem, however, is that if the wavepackets met up at some stage in
the future, then they would interfere constructively. One could not, therefore, say
that the particle is following a definite trajectory.
462 Jonathan J.Halliwell
The problem is highlighted when one writes down the pure state density matrix
corresponding to the state (2.1). It is
This differs from Eq. (2.3) by the presence of off-diagonal terms. It is only when
these terms may be neglected that we may say that the particle is following a
definite trajectory.
There is no way that under unitary SchrOdinger evolution the pure-state den-
sity matrix (2.3) will evolve into the mixed-state density matrix (2.4). How, then,
may the interference terms be suppressed? The resolution of this apparent diffi-
culty comes from the recognition that no macroscopic system can realistically be
considered as closed and isolated from the rest of the world around it. Laboratory
measuring apparata interact with surrounding air molecules, even intergalactic gas
molecules are not isolated because they interact with the microwave background.
Let us refer to the rest of the world as "the environment," E. Then it can be argued
that it is the inescapable interaction with the environment which leads to a con-
tinuous "measuring" or "monitoring" of a macroscopic system and it is this that
causes the interference terms to become very small. This is decoherence.
Let us study this in more detail. Consider the system S considered above, but
now take into account also the states {1 En)} of the environment E. Let the initial
state of the total system SE be
The coherent states of the system I zn(0) thus become correlated with the envi-
ronment states I En). The point, however, is that one is not interested in the state
of the environment. This is traced out in the calculation of any quantities of inter-
est. The object of particular relevance, therefore, is the reduced or coarse-grained
density matrix, obtained by tracing over the environment states:
. 0) I .
= Tre I .1)(0)(1(t) 1 Ecne,,,(e,n I en) I zn(0)(. (2.7)
non
Information Dissipation in Quantum Cosmology 463
The density matrix I 4Xt))(4)(t) I of the total system evolves unitarily, of course. The
reduced density matrix (2.7), however, does not. It therefore holds the possibility
of evolving an initially pure state to a final mixed state. In particular, if, as can be
the case, the inner products (em I En ) are very small when n m, then Eq. (2.7)
will be indistinguishable from the mixed state density matrix (2.4).
One may now say that the environment has caused the density matrix to
decohere—it has permitted the interfering set of macroscopic configurations to re-
solve into a non-interfering ensemble of states, as used in classical statistical me-
chanics. Or to put it another way, the environment has "collapsed the wave func-
tion" of the system. Or yet another form of words, is to say that the environment
"induces a superselection rule" which forbids superpositions of distinct macroscopic
states from being observed. Note that the loss of information is an important as-
pect to the process. Classical behavior thus emerges only when information about
correlations is dissipated into the environment.
This general body of ideas has been discussed by many people, including Gell-
Mann, Hartle and Telegdi,5 Griffiths,8,9 Joos and Zeh,25 Omnes,35 Peres,37 Unruh
and Zurek,4° Wigner,49 Zeh,50 and Zurek 53,54,55,56
3. QUANTUM COSMOLOGY
We now apply the ideas introduced in the previous section to quantum cosmol-
ogy. This subject began life in the 1960's, with the seminal works of DeWitt,2
MiSner,3°'31'32.33 and Wheeler 46,47 More recently, it has been revitalized primar-
ily by Hartle and Hawking" and by Vilellkill.41'42'43'44'45 Some review articles are
those by Hartle16,17 and Halliwel1.14
The object is that one applies ideas from an as-yet incomplete quantum theory
of gravity to closed cosmological models. One imagines that the four-dimensional
space-time is sliced up into three-surfaces, and one concentrates on the variables
defined on the three-surfaces which describe the configuration of the gravitational
and matter fields. These are the three-metric hij and the matter field, which we take
to be a scalar field (1). The quantum state of the system is then represented by a wave
functional Alf[hii , 4,], a functional of the metric and scalar field configurations. For
rather fundamental reasons, the wave functional does not depend on time explicitly.
Loosely speaking, information about time is already contained in the variables his
and (I). Because it does not have an explicit time label, AY obeys not a time-dependent
Schrodinger equation, but a zero-energy equation of the form
where wki is the momentum conjugate to hii." The wave function (3.2) is therefore
analgousilho the sum of coherent states (2.1). The wave function qi,n[h15 ,1] is a
slowly varying function of the three-metric. It describes quantum field theory for
the scalar field t on the gravitational background hii.
So the first requirement for the emergence of classical behavior is satisfied by
the solution (3.2)—the wave function is peaked about a set of classical solutions.
But what about the second requirement, decoherence? Let us apply the ideas in-
troduced in the previous sections and introduce an environment which continually
monitors the metric. One meets with an immediate difficulty. The entire universe
has no environment. It is not an open system, but a closed one: in fact, it is the only
genuinely closed system we know of. The point, however, is that one is never inter-
ested in measuring more than a small fraction of the potentially observable features
of the universe. One may therefore regard just some of the variables describing the
universe as the observed system and the rest as environment. The latter are traced
out in the density matrix. In this way, some—but certainly not all—the variables
describing the universe may become classical.
Which variables do we take to be the environment? There is, in general, no
obvious natural choice. However, here we are interested in understanding how the
gravitational field becomes classical, so it is perhaps appropriate to regard the
matter modes as an environment for the metric. With this choice, the reduced
density matrix corresponding to the wave function (3.2) is
The object is to show that this is small for ki hii. It is very difficult to offer
general arguments as to the extent to which this is the case, but one can see it
for particular models. Numerous models have been considered in the literature
Mk is actually rather difficult to construct the analogue of coherent states in quantum cosmology.
See, however, Kiefer.27
Information Dissipation in Quantum Cosmology 465
(for example Fukuyama and Morikawa,34 Halliwell,is Kiefer,26 Mellor and Moss,29
Morikawa,34 Padmanabhan,36 Zeh61,62).
For definiteness, let us briefly consider one particular model.15 Suppose we
restrict the metric to be of the Robertson-Walker type:
ds2 = _dt2 4. a2(0d123 (3.5)
where dflg is the metric on the three-sphere. Then the gravitational field is described
solely by the scale factor a. Let us take the only source to be a cosmological constant
A. One may show that the wave function for this model is of the form (3.2), and
the e's part indicates that it is peaked about classical solutions of the form
1
a(t) = — cosh(Ht) (3.6)
H
where H2 = A/3. This is de Sitter space. Most models that have been considered
in the literature use the full infinite number of modes of the scalar field as the
environment. However, this leads to certain technical complications, so here we will
do something simpler. The de Sitter solutions have a horizon size a = H-1. One
may separate the scalar field modes into long (1) or short (s) wavelength modes,
4, = 4,/ +4,, , depending on whether their wavelength is, respectively, greater or less
than the horizon size. The number of modes outside the horizon is actually finite;
moreover, they are not observable, so it seems reasonable to consider these as the
environment. With this choice, and with a particular choice for the quantum state
of the scalar field, one finds that the reduced density matrix is
r (a _ ir i
0-(a, d)) ..::-. exp [ 0.2a (3.7)
This differs from Eq. (3.8) in one crucial respect, namely in the sign between a
and a, in Eq. (4.2). This has the consequence that itio._) is always very small, even
when a = d. The interference between expanding and collasping components of the
wave function may therefore be neglected.
MBecause there is no explicit time label, one cannot say which of the two solutions corresponds
to collapsing and which corresponds to expanding—one can only make relative statements. I am
grateful to H. D. Zeh for emphasizing this point to me.
Information Dissipation in Quantum Cosmology 467
ACKNOWLEDGMENTS
I would like to thank Jim Hartle, Raymond Laflamme, Seth Lloyd, Jorma Louko,
Ian Moss, Don Page and H. Dieter Zeh for useful conversations. I am particularly
grateful to Wojciech Zurek for many very enlightening discussions on decoherence.
I would also like to thank Wojciech for organizing such an interesting and successful
meeting.
468 Jonathan J.Halliwell
REFERENCES
1. Coleman, S. Nucl. Phys. B30 (1988):643.
2. DeWitt, B. Phys. Rev. 160 (1967):1113.
3. Ellis, J., S. Mohanty and D. V. Nanopoulos. Phys.Lett. 221B (1989):113.
4. Fukuyama, T., and M. Morikawa Kyoto preprint KUNS (1988):936.
5. Gell-Mann, M., J. B. Hartle and V. Telegdi. Work in progress, 1989.
6. Giddings, S., and A. Strominger. NucL Phys. B306 (1988):890.
7. Giddings, S., and A. Strominger Nucl. Phys. B307 (1988):854.
8. Griffiths, R. J. Stat. Phys. 36 (1984):219.
9. Griffiths, R. Am. J .Phys. 55 (1987):11.
10. Guth, A. H., and S. Y. Pi , Phys. Rev. Lett. 49 (1982):1110.
11. Guth, A. H., and S. Y. Pi. Phys. Rev. D32 (1985):1899.
12. Halliwell, J. J. Phys. Lett. 185B (1987):341.
13. Halliwell, J. J. Phys. Rev. D36 (1987):3626.
14. Halliwell, J. J. Santa Barbara ITP preprint NSF-ITP-88-131, 1988. An exten-
sive list of papers on quantum cosmology may be found in J. J.Halliwell, ITP
preprint NSF-ITP-88-132, 1988.
15. Halliwell, J. J. Phys. Rev. D39 (1989):2912.
16. Hartle, J. B. In High Energy Physics proceedings of the Yale Summer School,
New Haven, Connecticut, edited by M. J.Bowick and F. Gursey. Singapore:
World Scientific, 1985.
17. Hartle, J. B. In Gravitation in Astrophysics, Proceedings of the Cargese Ad-
vanced Summer Institute, Cargese, France, 1986.
18. Hartle, J. B. Phys. Rev. D37 (1988):2818.
19. Hartle, J. B.Phys. Rev. D38 (1988):2985.
20. Hartle, J. B. and S. W. Hawking. Phys. Rev. D28 (1983):2960.
21. Hawking, S. W. Phys. Lett. 115B (1982):295.
22. Hawking, S. W. Phys. Lett: 195B (1987):337.
23. Hawking, S. W. Phys. Rev. D37 (1988):904.
24. Hawking, S. W., and R. Laflamme. Phys. Lett. 209B (1988):39.
25. Joos, E., and H. D. Zeh. Z. Phys.B, 59 (1985):223.
26. Kiefer, C. Class. Quantum Gray., 4 (1987):1369.
27. Kiefer, C. Phys. Rev. D38 1761:(1988).
28. Lyth, D. Phys. Rev. D31 (1985):1931.
29. Mellor, F., and I. G. Moss., Newcastle preprint, 1988.
30. Misner, C. W. Phys. Rev. 186 (1969):1319.
31. Misner, C. W. Phys. Rev.Lett. 22 (1969):1071.
32. Misner, C. W. In Relativity, edited by M. Carmeli, S. Fickler and L. Witten.
San Francisco: Plenum, 1970.
33. Misner, C. W. In Magic without Magic: John Archibald Wheeler, a Collection
of Essays in Honor of his 60th Birthday, edited by J. Klauder. San Francisco:
Freeman, 1972.
34. Morikawa, M. Kyoto Preprint KUNS 923, 1988.
Information Dissipation in Quantum Cosmology 469
INTRODUCTION
Let me start off by telling you a science fiction story that is essentially in the tradi-
tion of curious stories about quantum mechanics, like the story about SchrOdinger's
cat and the story about Wigner's friend. Those stories both begin with the assump-
tion that every physical system in the world (not merely subatomic particles, but
measuring instruments and tables and chairs and cats and people and oceans and
stars, too) is a quantum-mechanical system, and that all such systems evolve en-
tirely in accordance with the linear quantum-mechanical equations of motion, and
that every self-adjoint local operator of such systems can, at least in principle, be
measured. Those are the rules of the game we're going to play here; and what I
want to tell you about is a move which is possible in this game, but which hasn't
been considered before.
The old stories of SchrOdinger's cat and Wigner's friend end at a point where
(in the first case) the cat is in a superposition of states, one in which it is alive
and the other in which it is dead; or where (in the second case) the friend is in
a superposition of states that entail various mutually exclusive beliefs about the
result of some given experiment. Suppose for example, that Wigner's friend carries
out a measurement of the y-spin of a spin —1/2 particle p that is initially prepared
in the state [a2 = +1/2)1,. He carries out the measurement by means of a measuring
device that interacts with p and that he subsequently looks at in order to ascertain
the result of the measurement. The end of that story looks like this:
1
[a) = —
1 [[Believes that ay = -E'")Friend
,
•[Shows that ety = +VMeasuring Device • frry = +.1)p]
1,
+ [[Believes that ay = 2 Friend
•[ay = —&12.1
(The phrase "Believes that ay = 1/2," of course, doesn't completely specify the
quantum state of Wigner's friend's very complicated brain. But the many other
degrees of freedom of that system—those, for example, that specify what sort of ice
cream Wigner's friend prefers, simply don't concern us here, and so, for the moment,
we'll ignore them). Now, such endings as this are usually judged to be so bizarre,
and so blatantly to contradict daily experience, as to invalidate the assumption that
gives rise to these stories. That is, these stories are usually judged to imply that
there must be physical processes in the world that cannot be described by linear
equations of motion, processes like the collapse of the wave function.
There are, on the other hand, as everybody knows, a number of ways of at-
tempting to deny this judgement; there are a number of ways of attempting to
suppose that this is genuinely the way things are at the end of a measuring process,
and that this state somehow manages to appear to Wigner's friend or to count for
Wigner's friend either as a case of believing that ay = +1/2 or a case of believing
that cry = —1/2.
One of these attempts goes back to Hugh Everett, and has come to be called
the Many Worlds interpretation of quantum mechanics. I think it's too bad that
Everett's very simple thesis (which is just that the linear quantal equations of
motion are always exactly right) has come to be called that at all, because that
name has sometimes encouraged a false impression that there are supposed to be
more physical universes around after a measurement than there were before it. It
might have been better to call what Everett came up with a "many-points-of-view"
interpretation of quantum mechanics, or something like that, because it is surely
true of Everett's picture (as it is in all other pictures of quantum theory that I
know about) that there is always exactly one physical universe. However, the rules
of Everett's game, which he insists we play to the very end, require that every one
of the physical systems of which that universe is composed—including cats and
measuring instruments and my friend's brain and my own brain—can be, and often
are, in those bizarre superpositions. The various elements of such a superposition,
in the case of brains, correspond to a variety of mutually exclusive points of view
The Quantum Mechanics of Self-Measurement 473
about the world, as it were, all of which are simultaneously associated with one and
the same physical observer.
Needless to say, in some given physical situation, different observers may be
associated with different numbers of such points of view (they may, that is, inhabit
different numbers of Everett worlds). Suppose, for example, that we add a second
friend (Friend # 2) to the little story we just told. Suppose at the end of that story,
when the state of the composite system consisting of p and of the measuring device
for y and of Friend #1 is [a), that Friend #2 measures A, where A is a maximal
observable of that composite system such that A[a) = a[a). Friend #2 carries out
that measurement by means of an A-measuring device (which, according to the
rules of the game, can always be constructed) which interacts with that composite
system and which Friend #2 subsequently looks at to ascertain the result of the
measurement. When that's all done (since the result of this measurement will with
certainty be A = a, things will look like this:
In this state, Friend #1 inhabits two Everett worlds (the world in which ay = +1/2
and the world in which ay = —1/2, whereas Friend #2 inhabits only one (the
world in which A = a), which by itself encompassPs the entire state [0). Moreover,
in his single world, Friend #2 possesses something like a photograph of the two
worlds which Friend #1 simultaneously inhabits (he possesses, that is, a recording
in his measuring device of the fact that A = a). By means of his measurement of
A, Friend #2 directly sees the full superposition of Friend #1's brain states; and
indeed, he can even specify the relative sign between those states.
Nothing ought to be very surprising in this, and indeed, it was all very well
known to Everett and his readers. So far as Friend #2 is concerned, after all,
Friend #1, whatever else he may be, is a physical system out there in the external
world; and consequently, according to the rules of our game, Friend #1 ought to
be no less susceptible to being measured in superpositions than a single subatomic
particle. But this need not be the very end of the game. One more move, which
is fully in accordance with the rules of the game, is possible; a move that Everett
never mentions. Here it is: Suppose, at the end of the slightly longer story we just
told, when the state of things is [ ), that Friend #2 shows his photograph of the
two Everett worlds that Friend #1 simultaneously inhabits to Friend #1. Suppose,
that is, that Friend #1 now looks at the measuring apparatus for A. Well, it's quite
474 David Z. Albert
that in this case the order in which those two measurements are carried out will be
important).
This automaton, then, when [7) obtains, knows (in whatever sense it may be
appropriate to speak of automata knowing things), accurately and simultaneously,
the values of both A and cry, even though those two observables don't commute.
What this means (leaving aside, as I said, all of the foggy questions about what
it might be like from the perspective of the automaton, which is what originally
drove us to the talk about multiple worlds) is that this automaton, in this state,
is in a position to predict, correctly, without peeking, the outcomes of upcoming
measurements of either A or cry or both, even though A and ey are, according to
the familiar dogma about measurement theory, incompatible.
Moreover, no automaton in the world other than this one (no observer in the
world other than Friend #1, in science fiction talk) can ever, even in principle, be
in a position to simultaneously predict the outcomes of upcoming measurements
of precisely those two observables (even though they can, of course, know either
one). The possibility of Friend #1's being able to make such predictions hinges on
the fact that A is an observable of (among other things) Friend #1 himself. There
is a well-defined sense here in which this automaton, this friend, has privileged
epistemic access to itself.
Let me (by way of finishing up) try to expand on that just a little bit.
There is an otherfamous attempt to suppose that the linear quantum-mechanical
equations of motion are invariably the true equations of motion of the wave-function
of the entire physical world. This attempt goes back to Bohm, and has recently been
championed and further developed by John Bell. It's a hidden variables theory (it
is, more precisely, a completely deterministic hidden variables theory, which ex-
actly reproduces the statistical predictions of non-relativistic quantum mechanics
by means of an averaging over the various possible values of those hidden variables),
and it has the same straightforward sort of realistic interpretation as does, say, clas-
sical mechanics. It's well known that there are lots of things in this theory that one
ought to be unhappy about (I'm thinking mostly about non-locality here); but let's
concentrate, for just a moment, on the fact that such a theory is logically possible.
Since this theory makes all the same predictions as quantum mechanics does, every
one of those predictions, including the ones in our story about quantum-mechanical
automata, will necessarily arise in this theory, too.
That produces an odd situation. Remember the two automata in the story
(Friend #1 and Friend #2). Suppose that h) obtains, and suppose that things
are set up so that some future act of #1 is to be determined by the results of
upcoming measurements of ay and A. On Bohm and Bell's theory, there is, right
now, a matter of fact about what that act is going to be, and it follows from what
we discovered about the automaton #1 can correctly predict what that act is going
to be, but not so for automaton #2, nor for any other one, anywhere in the world.
So it turns out that it can arise, in a completely deterministic physical theory,
that an automaton can in principle be constructed that can ascertain certain of its
own acts in advance, even though no other automaton, and no external observer
476 David Z. Albert
whatever—supposing even that they can measure with infinite delicacy and infinite
precision—can ascertain them; and that strikes me as something of a surprise.
Perhaps it deserves to be emphasized that there are no paradoxes here, and
no violations of quantum theory from which, after all, it was all derived. We have
simply discovered a new move here, a move that entirely accords with the rules of
quantum mechanics (if the quantum-mechanical rules are all the rules there are)
whereby quantum-mechanical observers can sometimes effectively carry out certain
measurements on themselves. This move just wasn't anticipated by the founders of
quantum mechanics, and it happens that when you make a move like this, things
begin to look very odd, and the uncertainty relations cease to apply in the long
familiar ways.
ACKNOWLEDGMENT
I'm thankful to Deborah Gieringer for her technical assistance in preparing this
paper for publication.
L. A. Khalfin
International Solvay Institutes of Physics and Chemistry, Universite Libre de Bruxelles,
CP-231, Campus Plaine, Boulevard du Triomphe, B-1050 Bruxelles, Belgium; permanent
address: Steklov Mathematical Institute of the Academy of Sciences U.S.S.R., Fontanka
27, 191011 Leningrad D-11, U.S.S.R.
"I do not believe in micro- and macro-laws, but only in (structural) laws
of general validity." —A. Einstein
Fast progress in experimental techniques supports more and more thorough
examinations of the applicability of quantum theory far beyond the range of phe-
nomena from which quantum theory arose. For all that, no restrictions in principle
are revealed for its applicability and none inherently of classical physical systems.
However, according to the Copenhagen interpretationP1the fundament of quantum
theory is classical ideas (the classical world) taken equally with quantum ideas
rather than being deduced from the letter. The Copenhagen interpretation stipu-
lates the joint application of two description modes, the classical and the quantum,
is the probability of the coincidence of such events: the measurement of the observ-
able A lit gives the result k1 , that of the observable A21, the result k2, etc.
PROBLEM
To find the general conditions on the values which can be expressed in the
form of Eq. (I).
In general this problem has not been solved up to now. In our work we can see
some of the not-simple cases. But now we will go to the simplest nontrivial case:
i = 1,2; j = 1,2; k = 1, 2. Assume for simplicity that aijk = ±1 = 4 = 1.
THEOREM. 1
PROOF It is possible to prove this result for some more general cases, but you can
see the direct and simple elegant proof:
_ =___
1 2
Ni2- (Ail + Al 2 + A21 + A22) — C
For the classical case, in which all An, Al2) A21, A22 are commute (c-numbers), a
trivial inequality for these c-numbers follows:
A11 A21 + All • A22 + Al2 • A21 - A,2 • A22 < 2•1 (4)
The inequality (4) gives the algebraic structure of the classical Bell-CHSH inequality
for correlation functions
[(A11 • A21) + (All A22) + (A,2 • A21) - (Al2 • A22)] < 2 (5)
The inequality (2) gives the algebraic structure of the quantum Tsirelson's inequal-
ity for correlation functions
[(Au • A21) + (An • A22) + (A,2 • An) — (A,2 • A22)] 5. 21/2. (6)
The inequalities (5) and (6) are model-independent; that is, they do not depend
on physical mechanism and physical parameters, except the space-time parameters
connected with the local causality. We see the principal fundamental gap between
classical Bell's and quantum Tsirelson's inequalities, because quantum Tsirelson's
inequalities do not contain the Planck constant. It is interesting to point out that
Tsirelson's quantum inequalities for the general case are the same as for simplest
spin 1/2 case.
The class of correlation functions ((...)), or rather of "behaviors" in the sense
of section 4, allowed by quantum Tsirelson's inequalities is essentially smaller than
that allowed by general probabilistic local causality (see section 4):
A
22
2_
a
0 _21 _
n acr
to obtain inequalities which, holding true for quantum objects, approximate the
classical Bell's inequalities in quasi-classical situations. Such inequalities, which of
course are model dependent, were derived in Khalfin and Tsirelson8 and we called
these inequalities the quasi-classical analogs of the classical Bell's inequalities. One
example of these inequalities is:
h2
[(A11 • A21) + (A11 • A22) + (Al2 • An) - (Al2 • A22)] < 2 + c— (8)
cr
The general stochastic behavior gives us the general inequality (7). The hidden
deterministic behavior gives us the classical Bell's inequalities. It is interesting that
the so-called dynamical chaos is also the hidden deterministic behavior. All classical
stochastic phenomena of the probability theory are hidden deterministic behaviors.
And only the quantum behavior gives us the "real" stochastic phenomena.
, 1 2 1 2 1, 2 1 , 2
11(q1,P1; q2, P2) = • P2 + -t2 • q2 k12 • gl - 42 (13)
2m1 2m2 2 2
and let us add to this interaction the fluctuation forces, which for simplicity we
will assume are not correlated for objects 1 and 2. The equations of motion, in the
language of the stochastic differential equations are:
1
dpi = - • o • - ki2 • • + Ai • dbi, dqi = — • pidt
1
(14)
dp2 = - k2 • q2 • dt - k12 • qi • di+ A2. db2, dq2 = — • p2 dt
m2
484 L A. Khalfin
where b1(0,62(t) are noncorrelated Wiener processes, the derivatives of which are
white noise processes:
(i'1,2(s) • i 1 ,2(0) = cr(s — t) (15)
and Al, A2 are the intensity of the fluctuating forces. For A1, A2, by using the
fluctuation-dissipation conditions, the following expression is derived:
It is wonderful, but the condition (17) does not depend on kl , k2. For a more simple
case of identical objects we have, from Eq. (17):
1 2 2r
< —A2 = -r
h
•k • T =
h B rtherm
(18)
clef
'therm =
kBT
For a more general form of interaction with potential energy U(qi, 0), the condition
(18) is
02U(qi , q2 )1 < 2r
(19)
I NI .8q2 I - rtherm
which defines the corresponding •rf .. So for times t > Di we see classical (without
any quantum interference) dynamics.
OIT( t)
H 1111(0) = ih , H = const(t)
et
o) = = 0)) , P 011 I 0) = 1
Hicok) = E k), (9 kis t) = bk
H ISoE) = EI ipE),(9 E iio E') = O(E —
From the condition (41014/0) = 1 follows that there must exist (independent from
H) some self-adjoint operator H0 , for which Ro) is the eigen vector of the discrete
spectrum of Ho
HoRo) = EgRo) (22)
If we choose different initial vector states Ro), then H0 also will be different.
The initial vector state 00) defined as additional to and independent of H the
information on the "preparation" or the origin of the investigated physical system.
From H and Ho we can define the interaction part of the Hamiltonian Hint =
H — Ho.
Let us define now the decay amplitude p(t):
The decay amplitude p(t) is the characteristic function of the probability theory
point of view.
DEFINITION The solution 141(1)) (which was defined by operator H and initial vector
I*0), or operator Ho; see Eq. (22)) we call irreversible if
IP(t)I t o (25)
1(12idti<00
lo1 (27)
where t(E) is the continuous "preparation function" follow for t > (h/Eo) such
decomposition:
2
P(t) •-"= exp —
h
it') - r • 40) •r2) 1
° (V ) • (29)
h 7 (Eli + t
The exponential (main for t of order 0/0-1) does not depend on the "preparation
function," and nonexponential term for I' << Eo for this times is very small. If
t oo, r 0, rt = const all nonexponential terms disappear.
Now we investigate the problem of the foundation of statistical physics by
using methods which are analogical to methods of the quantum decay theory. From
section 4 it follows that we must investigate this problem within quantum theory.
First of all, we must understand that the problems of statistical physics are only
a special kind of general problems of the quantum theory, and must be defined
by some additional structure. These problems are defined by full Hamiltonian H,
initial vector state 111f0) (or by Ho), and by additional self-adjoint operator A:
The full information in statistical physics is in the set of probabilities {Pk (t)}, Vk,
where
Pk(t) = IPk(i)12 , Pk(t) = (tki41(t)) (31)
It is very essential to point out that the full Hamiltonian H includes all quantum
fields which define interactions, but A defines the finite number N particles in the
finite box (see Figure 2) for these particles, but not for the fields! For this reason,
the full Hamiltonian H, which includes the quantum fields in infinite space, has an
absolute continuous spectrum, which gives us the dynamics (spontaneous) origin of
irreversibility.
Now we can define usual entropy of statistical physics:
for which we must obtain for some special condition the proof of the Second Law
(Boltzmann H-Theorem). Usual von Neumann entropy is the dynamical invariant
for general problems of quantum theory and have no direct correspondence to the
entropy (32) for the problems of statistical physics (it is evident because for von
Neumann entropy the Second Law is not true—this entropy is the dynamical in-
variant of quantum theory).
From Eqs. (30) and (31) it is easy to see that
+ro re-t
0
Et+t Eit • c(E) c* (r) • -Th(E' , • dB • dB,
.
E ak • -rk(E' , E) = 9(E',
Pk(00) = 10 13k(E) • LV(E) • dE
r h cos ( Eot)
P(t) =P k(oo) + 7k e — t + a0 • e-i
22
(Eg +r) h
—3 (36)
r2 h2
+ bo
(Ea + r2)4 t2 °
The usual axiomatic statistical physics cannot be the exact theory: the
ergodicity and the intermixing which are not true, the nonexponential de--
creasing of the correlation functions, the equilibrium distribution which
depends on r (the relaxation time). But for usual cases (r << Eo) the
axiomatic statistical physics is a very good approximation; however, the
accuracy of this approximation is not homogeneous for all problems of the
statistical physics.
490 L A. Khalfin
S (t)
t2 t3
BOLTZMANN H-THEOREM.11 If
then
dS( t)
a. SO ) < S(co), b. > 0, Vt[0,00). (39)
dt
From this theorem it is possible for some conditions of H, IWO, and A to prove the
Second Law for some big, but finite interval of time t E [11,t2] (see Figure 3). But
it also can be proved that for finite small interval of time t E [t2, t3], in which two
nonexponential terms in Eq. (36) have the same order; for general initial conditions
(IV), the Second Law will not be true (see Figure 3). It gives us the first dynamical
mechanism for not-special conditions for the origin of the order from chaos. This
interval of time t E [12, ts] is the interval of very big times for usual physical systems
(God created life (order) on the last day).
Classical Bell's and Quantum Tsirelson's Inequalities 491
ACKNOWLEDGMENTS
As indicated before, the work reviewed here was done in collaboration with Dr. B. S.
Tsirelson. I am indebted to him for interesting co-work and interesting discussions.
My big thanks to the Santa Fe Institute, especially to Prof. J. A. Wheeler and
Dr. W. H. Zurek for the invitation to the workshop "Complexity, Entropy and
the Physics of Information" (Santa Fe, New Mexico, May 29—June 2, 1989). My
big thanks also to the participants of this workshop for interesting discussions. The
final version of this report was prepared at the Santa Fe Institute. My big thanks to
Dr. George A. Cowan, President of the Santa Fe Institute for the warm hospitality
and the pleasant conditions for scientific work. I thank Prof. T. Toffoli and Dr.
W. H. Zurek for improvement of the English version of this report.
492 L A. Khan
REFERENCES
1. Caldeira, A. 0., and A. G. Leggett. Phys. Rev. 31A (1985):1059.
2. Cirelgon, B. S. (a.k.a. B. S. Tsirelson). Lett. Math. Phys. 4 (1980):93.
3. Diosi, L. Phys. Lett. A122 (1987):221.
4. Diosi, L. Phys. Lett. A129 (1988):419.
5. Fock, V. A., and N. S. Krylov. JETPH 17 (1947):93.
6. Joos, E., and H. D. Zeh. Zeitschr. Phys. Ser. B 59 (1985):223.
7. Joos, E. In "New Techniques and Ideas in Quantum Measurement Theory."
Ann. N.Y. Acad. Sci. 480 (1986):6.
8. Khalfin, L. A. DAN USSR 115 (1957):277.
9. Khalfin, L. A. JETPH 33 (1958):1371.
10. Khalfin, L. A. DAN USSR 162 (1965):1273.
11. Khalfin, L. A. Theor. & Math. Phys. 35 (1978):425.
12. Khalfin, L. A. Uspekhy Mathematicheskikh Nauk 33 (1978):243.
13. Khalfin, L. A. Phys. Lett. 112B (1982):223.
14. Khalfin, L. A. "Bell's Inequalities, Tsirelson Inequalities and K° - Te, D° -
o 0 o
D , B -B Mesons." Report on the scientific session of the Nuclear Division
of the Academy of Sciences USSR., April 1983; unpublished.
15. Khalfin, L. A., and B. S. Tsirelson. "Quantum and Quasi-Classical Analogs
of Bell's Inequalities." In Proceedings of the Symposium on the Foundations of
Modern Physics, 1985, edited by P. Lahti et al. New York: World Scientific,
1985, 441.
16. Khalfin, L. A. "The Problem of the Foundation of the Statistical Physics, the
Nonexponentiality of the Asymptotic of the Correlation Functions and the
Quantum Theory of Decay." In Abstracts of the First World Congress Be-
noulli Society, 1986, edited by Yu. V. Prokhorov, Vol. II. Nauka, 1986, 692.
17. Khalfin, L. A. and B. S. Tsirelson. "A Quantitative Criterion for the Applica-
bility of the Classical Description within the Quantum Theory." In Proceed-
ings of the Symposium on the Foundations of Modern Physics, 1987, edited
by P. Lahti et al. New York: World Scientific, 1987, 369.
18. Khalfin, L. A. "The Problem of the Foundation of Statistical Physics and
the Quantum Decay Theory." Paper presented at the Stefan Banach Inter-
national Mathematical Center, September 1988, Warsaw, Poland; to be pub-
lished.
19. Khalfin, L. A., and B. S. Tsirelson. "Quantum-Classical Correspondence in
Light of Bell's Inequalities." To be published.
20. Unruh, W. G., and W. H. Zurek. Phys. Rev. D40 (1989):1071.
21. Wootters, W. K., and W. H. Zurek. Phys. Rev. D19 (1979):473.
22. Zurek, W. H. In "New Techniques and Ideas in Quantum Measurement The-
ory." Ann. N.Y. Acad. Sci. 480 (1986):89.
Classical Bell's and Quantum Tsirelson's Inequalities 493
INTRODUCTION
Several significant technical advances concerning the interpretation of quantum
mechanics have been made more or less recently, mostly during the last decade.
I refer particularly to the discovery and study of environment-induced superselec-
tion rules,1,2,3,8,25,26 some new general results in semi-classical physics,4,9,16 the
distinction to be made between a macroscopic system and a classically behaving
one,1233 the possibility to describe a consistent history of a quantum systems as
well as a description of a quantum system by ordinary Boolean logic.14 It turns
out that all of them can now be joined together to provide a completely new inter-
pretation of quantum mechanics to be called here the logical interpretation. This
name is not coined to mean that the progress made along the lines of logic is more
important than any other advance but to stress the unifying role of logic when
bringing them together into a consistent theory. The logical interpretation stands
upon many fewer axioms than the Copenhagen interpretation and, in fact, upon
just a unique universal axiom, and it is not plagued by unprecisely defined words or
notions. Its practical consequences, however, coincide mostly with what comes out
of the Copenhagen interpretation, except for the removal of some of its disturbing
paradoxical features.
There is no consensus as to what must be considered the most basic difficulties
of conventional quantum mechanics. One may use, however, the hindsight provided
by recent advances to identify them with two basic problems, having to do re-
spectively with the status of common sense and the status of empirical facts in
quantum mechanics. The first problem comes out of the huge logical gap separat-
ing the mathematical framework of the theory (with its Hilbert space and so on)
from the ordinary direct physical intuition one has of ordinary physical objects. As
will be seen, this is a real problem boiling down to the relation existing between
physical reality and its description by mathematics and logic; one will have to make
this correspondence clear by stating explicitly how it must be formulated.
The second problem comes from the intrinsically probabilistic character of
quantum mechanics: Remembering that a theoretical probability can only be
checked experimentally by performing a series of trials and noticing that this proce-
dure makes sense only if the result of each individual trial is by itself an undoubtable
fact, one sees that quantum mechanics, as an intrinsically probabilistic theory, must,
however, provide a room for the certainty of the data shown off by a measuring de-
vice, i.e., for facts. The solution of this dilemma will involve a proof of the validity
of some semi-classical determinism within the framework of quantum mechanics.
A complete interpretation will be obtained by solving these two problems. The
general strategy will, however, strongly differ from the Copenhagen approach: Clas-
sically behaving objects, giving rise to observable facts obeying well determinism
and allowing their common sense description by usual logic, will be interpreted by
quantum mechanics and not the other way around. This direct interpretation of
what is observed by the most fundamental form of the theory is not only what
should be expected from science but it also turns out to be both straightforward
and fruitful.
GENERAL AXIOMS
The following basic axioms of quantum mechanics will be taken for granted:
Axiom 1 associates a Hilbert space H and an algebra of operators with an
individual isolated physical system S or, more properly, with any theoretical model
of this system.
Axiom 2 defines dynamics by the Schr8dinger equation, using a hamiltonian H.
The corresponding evolution operator will be written as U(t) = exp(-2riHtlh).
Some Progress in Measurement Theory 497
E= 1x x 1 clx ,
with the predicate to give it a meaning in Hilbert space grammar. More generally,
to any set C in the spectrum of an observable A, one can associate a predicate [A, C]
meaning "A is in C" and a well-defined projector E. The time-indexed predicate
stating that the value of A is in C at time t can be associated with the projector
E(t) = U-1 (t)EU(t) by taking into account the Schradinger equation. Conversely,
any projector can be used to define a predicate as can be shown by taking A = E
and C = {1} in the spectrum of the projector E. One can now define states:
Axiom .4 assumes that the initial state of the system at time zero can be de-
scribed by a predicate E0 . This kind of description can be shown to represent
correctly a preparation process once the theory is complete. A state operator p will
be defined as the quotient EolTrEo. For instance p = E0 gr 0 >< 'yo 1 in the case
of a pure state. We shall also freely use, when necessary, the concept of a density
matrix.
HISTORIES
As introduced by Griffiths,6 a history of a quantum system S can be considered as
a series of conceptual snapshots describing some possible properties of the system
498 Roland Omnes
at different times. It will be found later on that a history becomes a true motion
picture in the classical limit when the system is macroscopic.
More precisely, let us choose a few ordered times 0 < ti < • • • < tn, some
observables A1 , • • - , An which are not assumed to commute and some range of values
•• • , Cn for each of these observables. A story [Ai, • • • , An , Cl, • • • ,Cnyt1) • • • )4]
is a proposition telling us that at each time (i5 ) (j = 1, • • •n), Al has its value in
the range Cj.
Griffiths proposed to assign a probability to such a story. We shall write it in
the form
w = Tr(En(tn )- • • Ei(ti)pEl(ii)• • • En(in)) • (1)
Griffiths used a slightly different expression and he relied upon the Copenhagen
interpretation to justify it. Here Eq. (1) will be postulated with no further justifi-
cation, except to notice that it is "mathematically natural" when using Feynman
path summations because a projector Ei(tj) is associated with a window through
which the paths must go at time tj. It should be stressed that w is just for the time
being a mathematical measure associated with the story, having not yet any em-
pirical meaning that could be found by a series of measurements. Quite explicitly,
we don't assume that we know right now what a measurement is.
Griffiths noticed that some restrictions must be imposed upon the projectors
entering Eq.(1) in order to satisfy the basic axioms of probability theory and par-
ticularly the additivity property of the measures for two disjoint sets. To show what
that means, it will be enough to consider the simplest case where time takes only
two values t1 and t2, denoting by E1 (E2 respectively) the projector associated with
a set Cl (C2 respectively) and by El = I — El the orthogonal projector. In that
case, it can be proved that all the axioms of probability calculus are satisfied by
definition in Eq. (1) if the following consistency condition holds:
One knows how to write down similar necessary and sufficient conditions in the
general case. The essential point is that they are completely explicit.
LOGICAL STRUCTURE
Griffiths' histories will now be used to describe logically a system in both a rigorous
and an intuitive way.
First recall properly, what logicians call a logic or, more property, an interpre-
tation of formal logic consists of the following: one defines a field of propositions
(a,b,- • •) together with four operations or relations among them, giving a meaning
to "a or b," "a and b," "not a" and "a implies b," this last relation being denoted
by a =• b or "if a, then b." This is enough to do logic rigorously if some twenty or
Some Progress in Measurement Theory 499
so abstract rules are obeyed by "and, or, not, if...then." This kind of logic is also
called boolean.
Probability calculus is intimately linked with logic. One can make it clear by
choosing, for instance, two times t1 and t2 and two observables Al and A2. The
spectrum ai of Al will be divided into several regions {Cia} and similarly for 0'2.
An elementary rectangle Cia x C20 in the direct product al x cr2 will be considered
as representing a Griffiths' history or what a probabilist would call an elementary
event. A union of such sets is what a probabilist calls an event and here it will be
called a proposition describing some possible properties of the system.
As usual in set theory, the logical operators "and, or, not" will be associated
with an intersection, a union, or the complementation of sets, so that these three
logical rules and the field of propositions or events are well defined.
When a proposition a is associated with a union of two sets a1, a2, each one
representing a story, its probability will be defined by
w(a and b)
w(b I a) = (4)
w(a)
One can now introduce a unique and universal rule for the interpretation of
quantum mechanics, stating how to describe the properties of a physical system in
ordinary terms and how to reason about these properties:
Axiom 5: Any description of the properties of a system should be framed into
propositions belonging to a consistent logic. Any reasoning concerning them should
be the result of an implication or a chain of implications.
From there on, when the word "imply" will be used, it will be in the sense of
this axiom. The logical construction allows us to give a clear-cut meaning to all the
reasonings an experimentalist is bound to make about his apparatuses. In practice,
it provides us with an explicit calculus of propositions selecting automatically the
propositions making sense and giving the proof of correct reasonings. Two examples
will show how this works.
In an interference two-beams experiment, it is possible to introduce the ele-
mentary predicates stating that, at some convenient time /2, a particle is in some
region of space where the two beams are recombined. All the predicates correspond-
ing to different regions describe the possible outcomes of the experiment, although
one does not know yet how to describe a counting device registering them. They
constitute a consistent logic. It is also possible to define a projector expressing that
the particle followed the upper beam but, lo and behold, there is no consistent logic
containing this predicate together with the previous predicates describing the out-
comes of the experiment. This means that logic precedes measurement. There is no
need to invoke an actual measurement to discard as meaningless: the proposition
stating that the particle followed the upper beam. Logic is enough to dispose of
it according to the universal rule of interpretation, because there is no consistent
logic allowing such a statement.
More positively, one may also consider a particle coming out of an isotropic
S-state with a somewhat well-defined velocity. This property can be described by
an initial projector E0. Another projector E2 corresponds to the predicate stating
that the particle has its position within a very small volume 5V2 around a point
z2 at time t2. Then, one can explicitly choose a time ti < /2, construct a volume
V1 that has its center on the way from the source to 22 and is big enough, and
prove the logical implication: "The particle is in 5V2 at time 22 the particle is in
V1 at time ti ." So, one can prove in this logical framework that the particle went
essentially along a straight trajectory. Similar results hold for the momentum at
time ti . To speak of position and momentum at the same time is also possible, as
will be seen later on, but with some restrictions.
Simple as they are, these two examples show that the universal rule of inter-
pretation is able to select meaningful propositions from meaningless ones and also
to provide a rational basis for some common sense statements which had to be
discarded by the Copenhagen interpretation.
Some Progress in Measurement Theory 501
CLASSICAL LIMIT
What we have called the universal rule of interpretation makes little avail of what
Bohr could have also called a universal rule of interpretation; namely the prop-
erties of a macroscopic device are described by classical physics. In fact, what he
really needed from classical physics was not so much classical dynamics as classical
logic where a property can be held to be either true or false, with no probabilistic
fuzziness.
Bohr's assumption is not as clear-cut as it once seemed since Leggett has shown
that some macroscopic systems consisting of a superconducting ring that has a
Josephson weak link can be in a quantum state.12,13 As a consequence, nobody
seems to be quite sure anymore what the Copenhagen interpretation really states
in this case.
The way out of this puzzle will be found by showing why and when classical
physics, i.e., classical dynamics together with classical logic, holds true as a conse-
quence of the universal interpretative rule. This is, of course, a drastic change of
viewpoint as compared with the familiar course of physics since it means that one
will try to prove why and when common sense can be applied rather than taking
it for granted as a gift of God. In that sense, it is also a scathing attack against
philosophical prejudice.
To begin with, one must make explicit what is a proposition in classical physics.
One may consider, for instance, giving the position and the momentum of a system
within some specified bounds. Such a statement is naturally associated with a cell
C in classical phase space (in that case a rectangular cell). Since motion will deform
such a cell, it looks reasonable to associate a classical predicate with a more or less
arbitrary cell in phase space. It will also be given a meaning as a quantum predicate
if one is able to associate a well-defined projector E(C) in Hilbert space with the
classical cell C in phase space.
If one remembers that, in semi-classical approximations, each quantum state
counts for a cell with volume h", n being the number of degrees of freedom, two
conditions should obviously be asked from the cell C:
1. It must be big enough, i.e., its phase space volume be much larger than itn.
2. It should be bulky enough and with a smooth enough boundary to be well tiled
by elementary regular cells.
This last condition can be made quite precise and, when both conditions are
met and the cell is simply connected, i.e., in one piece with no hole, we shall say
that the cell is regular.
Now there is a theorem stating that an approximate projector E(C) can be
associated with such a regular cell.1°,15 To be precise, one can define it in terms
of coherent (gaussian) states gqp with average values (q,p) for their position and
momentum, putting
It is easily found that the trace of E(C) is the semi-classical average number
N(= volume of C/h") of quantum states in C. In fact, E(C) is not exactly a
projector, but one can prove that
where L and P are typical dimensions of C along configuration space and momen-
tum space directions. The kind of bound on the trace of an absolute value operator
as met in Eq. (6) is exactly what is needed to obtain classical logic from quan-
tum logics. Using E(C) or a true projector near enough to it, one is therefore able
to state a classical property as a quantum predicate. This kind of theorem relies
heavily upon microlocal analysis" and, as such, it is non-trivial.
One may extend this kind of kinematical properties to dynamical properties
by giving a quantum logical meaning to the classical history of a system. To do
so, given the hamiltonian H, one must first find out the Hamilton function h(q,
associated with it. The answer is given by what is called in microlocal analysis the
Weyl-symbol of the operator H and, in more familiar terms, the relation between
H and h(q, p) is exactly the one occurring between a density matrix p and the
associated Wigner distribution function23,24f.(q, p).
Once the Hamilton function h(q,p) is thus defined, one can write down the
classical Hamilton equations and discover the cell C1 which is the transform of
an initial regular cell Co by classical motion during a time interval t. Of particular
interest is the case when Cl is also regular and one will then say that the hamiltonian
(or the motion) is regular for the cell Co during the time interval t. It will be seen
that regular systems are essentially deterministic, hence their great interest.
Since Co and C1 are both regular, one can associate with them two approximate
projectors El) and El as given by Eq.(5), satisfying condition (6). If E0 were treated
like a state operator, it would evolve according to quantum dynamics to become
after a time II the operator
Here c is a small number depending upon Co, CI and t, expressing both the
effect of classical motion and wave packet expansion. In a nutshell, this theorem
tells us that quantum dynamics logically coincides with classical dynamics, up to
an error of order c, at least when regular systems are considered.
This theorem can be used to prove several results concerning the classical be-
havior of a regular system. Considering several times 0 < t 1 < • - < t„, and an
initial regular cell Co becoming, successively via classical motion, the regular cells
Some Progress in Measurement Theory 503
C1, • • • , one can use the projectors associated with these cells and their comple-
ments to build up several quantum propositions. One can then use Eq. (8) to prove
that the quantum logic containing all these predicates is consistent. Furthermore,
if one denotes by [C5 , t5 ] the proposition stating that the system is in the cell C5
at time ti [as characterized by the value 1 for the projector E(C5 one can prove
the implications
[C5,ti ] [Ck, tk] (9)
whatever the couple (j, k) in the set (1, • • • , n). This implication is valid up to an
error e, c being controlled by the characteristic of the cells and the time tr, as
explained above.
Eq. (9) has far-reaching consequences. It tells us that classical logic, when
expressing the consequences of classical dynamics for a regular system and regular
cells, is valid. Of course, it is only valid up to a possible errors as shown by the
example of the Earth leaving the Sun or of a car getting out of a parking lot by
tunnel effect. This kind of probability is essentially the meaning of the number e
and its value is specific for each special case to be considered.
Furthermore, the implications in Eq. (9) entail that the properties of a regular
system show up, at least approximately, determinism (since the situation at some
time t j implies the situation at a later time t k ). Such a system can also keep a
record or a memory (since the situation at a time ti implies the situation at an
earlier time ti). It will be convenient to call potential fact such a chain of mutually
implying classical propositions. This name is used because determinism and record-
ing are essential characteristics of facts, but one should not, however, forget that
at the present stage the theory is still only just talk-talk-talk with no supporting
experiments, hence the term "potential" meaning an imaginary possibility.
Since Hagedorn has shown that wave packet spreading is mainly controlled by
quantities known from classical dynamics,7 the property of regularity can be in prin-
ciple checked completely within classical dynamics. An obvious counter-example of
a system not behaving regularly is provided by a superconducting quantum inter-
ference device in a quantum situation described by Leggett12,13 and investigated
by several experimentalists.18,19,20,21 Another example is given by a K-flow after
a time t large enough to allow a strong distortion of cells by mixing and we shall
come back to it later on.
atoms and the electrons in the ball and the wire are the microscopic coordinates.
Their number N is very large and they are collectively called the environment.
One may start from an initial situation where e is given and the velocity is
zero. More properly, this can be achieved by a gaussian state 0 > realizing these
conditions on the average. It may be convenient to assume that the ball and the
wire are initially at zero temperature so that the environment is in its ground state
I 0 >. So, the complete description of this initial state is given by
10)=10>010> (10)
Naively, one would say that the motion of the pendulum will generate defor-
mations of the wire and therefore elastic waves or phonons leading to dissipation.
If one compares two clearly different initial situations 101) and 102), the amount
of dissipation in each case after the same time interval will be different so that the
corresponding states of the environment will become practically orthogonal as soon
as dissipation takes place.
Consider now the initial state
and the density operator p = Ilf ) OF I. The collective density matrix pc , describing
only the collective coordinate, will be defined as the partial trace of r over the
environment. Putting I IP >= al 101 > +a2 102 > , which is a state of the
collective degrees of freedom only, one finds easily that
MO = (ci 101 > +az 1 02 >)(ai < el I +(a; < 02 I). (11)
On the other hand, the orthogonality of environmental states noted previously
gives, once some dissipation has taken place,
the state 101 > being related to the initial state 101 > in a way exhibiting motion
and damping which need not interest us here. The essential point is the diagonal
form of Mt) showing the disappearance of phase relations between the two states or
what is called an effective superselection rule 25'26 It shows that the corresponding
potential facts are well separated (distinct) and the theory of measurement that
follows will also show them to be exclusive.
As is well known, these naive arguments can be replaced by serious
proofs1,2,3,8,25,26 upon which we shall not elaborate, except for a significant remark.
The objection has been raised that effective superselection rules do not provide
a final proof of fact separation for two different reasons:25,26
1. When the collective system is an harmonic oscillator and the environment con-
sists of a bath of harmonic oscillators linearly coupled to it, one can prove
Some Progress in Measurement Theory 505
occurrence of a fact, but it is able to make place for this uniqueness. Maybe there
is no cause after all and the theory just describes what really is.
To get to these deep (or slippery) questions, one can follow Heisenberg's conven-
tion by calling true an actual fact (i.e., a unique recorded past fact as opposed to a
potential one). However, one may go further by relying upon the non-contradiction
theorem mentioned in Section 4 and consider a statement as reliable when it is the
logical consequence of a fact. For instance, when I see as a fact the track of a particle
in a bubble chamber, I can assert reliably that it came essentially along a straight
line before being detected. This is a simple instance where the somewhat formal
present theory is nearer to common sense than the Copenhagen interpretation.
MEASUREMENT THEORY
Measurement theory now becomes a mere exercise.14 To be specific, we shall only
consider here the measurement of an observable A belonging to a physical system
Q when the eigenvalues{a4 of A are non-degenerate and discrete, and the mea-
surement is of the so-called first kind, preserving this eigenvalue. There is no special
difficulty in treating more general cases.
A measuring apparatus M will be used to measure the observable A. It will
be convenient to consider a collective variable B of M as the measurement data.
One can adapt the theory of facts to the case where there is friction and damping.
This allows us to consider as data the final position of a dial on a counter or its
digital recording. In that case, the observable B can only take, after an irreversible
interaction with the environment lasting a time 5, some values bo, b1, • • • , bn , • • •
which are the experimental data. Initially, B has the neutral value bo. It should
be stressed that the measuring device is treated here by quantum mechanics but,
nevertheless and consistently, data is treated like facts.
It will be assumed that Q and M are initially non-interacting and, because of
some wave-packet overlapping, they begin to interact at time to and do not interact
any more after time t1 = to + 6 where M has registered data.
M will be assumed to be a perfect measuring apparatus of the first kind for
the observable A. This property can be made explicit by introducing the evolution
operator S = U(to,ti) for the Q M system: it will be assumed that S an >
(10 bor > M (i.e., the effect of the interaction upon the initial state ( a. > and a
state of M characterized by the neutral initial marking 1)0 and degeneracy indices
r) is only a linear superposition of some states I ay > Q I b„, , r' > M , where
q = m = a This semi-diagonality of the S-matrix is the only ingredient that one
needs to completely define a measurement.
Now the logical game consists of introducing many predicates together with
their associated projectors: some of them describe the history of Q before mea-
surement, some others the history of Q after measurement, a predicate states the
initial value bo, other predicates mention possible final data bn , and finally some
Some Progress in Measurement Theory 507
predicates enunciate the possible values of A at time to and at time ti. One also
introduces the negation of these predicates, to obtain a field of propositions for the
measurement process altogether forming a logic L.
The first question is to decide whether or not this logic L is consistent. To
respond, it is convenient to introduce two logics L1 and L2 referring only to the
measured system Q: L1 tells stories of Q before measurement and assumes A = an
(or not) at time to. L2 begins by an initial statement Eo =I an >< an I at time t1
and tells stories of Q after measurement.
One can then prove that L is consistent if and only if L1 and L2 are respectively
consistent.
The occurrence of the initial predicate Eo in L2 is obviously wave-packet re-
duction. Its precise meaning is the following one: one can describe the story of Q
after measurement once it again becomes an isolated system, but the data B = bn
forces us to take the initial preparation predicate Eo. The basic nature of wave
packet reduction turns out to be what logicians call in their own language a modus
ponens: you use, for instance, a modus ponens when you apply a theorem while
forgetting how you proved it, discarding the corresponding implications. Similarly,
one can discard the past history of Q and the whole history of M, taking only into
account the data B = b,, when telling the story of Q after measurement.
One can do this consistently, but it is necessary to use E0 as the initial predicate.
Notice that one might have chosen in mathematics to remember all the proofs of all
theorems and in physics to follow the story of every apparatus and every particle
that came to interact with Q at one time or another. In that sense, wave packet
reduction is not really essential: it is only a very convenient result. Note however
that, were we not to use it, we would have to define the initial state at time t = —oo
and maybe introduce the whole universe in our description. So, in that sense, wave
packet reduction is really very useful.
Knowing that the overall logic L is consistent, one can try to prove some of its
implications. The most interesting one is the following:
[B = b,., t1) [A = (13)
or, in words, the result A = an of the measurement is a logical consequence of the
data B = b,,. The nature of this relation between data and result was left in the
shadows by the Copenhagen interpretation, leading to difficulties such as the EPR
paradox.
Another theorem tells us that, under some trivial restrictions, one can perform
once again a measurement of A after the first measurement giving the result an,
the second result will also be an ("repetitivity").
Finally, one can try to compute the probability for the predicate [B = 67441]
describing the experimental data. Because of the semi-diagonality of the S-matrix,
this probability turns out to depend only upon the properties of the Q-system and
not at all upon the irrelevant degeneracy indices r which represent the model of the
apparatus, its type, its color, or its age. This probability is simply given by
=< an I U(ti)Pe-1(ti) I an >, (14)
508 Roland Omnes
i.e., Born's value for the probability of the result A = an. Using Axiom 3, one
can now consider a series of independent experimental trials, give, as undoubtable
fact, meaning to the result of each trial, and therefore, give an empirical meaning
to probabilities as representing the frequency of a given physical result. The final
link between the theory and empirical physics is then contained in a last axiom
expressing Born's interpretation of the wave function, i.e., Axiom 6. The theoretical
probability of an experimental result is equal to its empirical frequency.
So, finally, one has recovered the main results of the Copenhagen interpretation
without several of its limitations and its paradoxes. The exact evaluation of these
results as providing perhaps an escape from the difficulties of quantum mechanics
will presumably need some time and much discussion and it would be premature
to assert it by now. However, it seems rather clear that the resulting interpretation
is objective.
wi = Tr(pEi). (15)
The same results would follow from the effective density matrix
Ej
(16)
Pef f (°)= E wi Tr E, •
Some Progress in Measurement Theory 509
One can then follow the successive measurements by using NJ f (0), letting it
evolve by U(t) during a time interval At where the cells remain regular, compute
wi(At), and reconstruct pen (At) from them by using Eq. (16). The errors can be
estimated and they increase only linearly in time. The following results can then
be obtained at the rigorous level of theoretical physics in contrast to mathematical
physics.17
1. The entropy
Sef f = —kTr(pe f f log pe f f) (17)
Az,o=EwiX1(2,P)
where xi p) is the characteristic function of the domain Cj. The same procedure,
using the classical equations of motion, leads to a Markov process for the new wj's
(identical with the old ones for t = 0). Then one can show that the classical averages
for a slowly varying dynamical variable coincide with the quantum averages, except
for small linearly increasing errors. So, classical physics is in fact retrieved but only
in a statistical sense.
The EPR experiment is interesting from this point of views for two reasons,
first because it has led to some puzzling considerations about the transfer of
information.5 Furthermore, the non-contradiction theorem makes the logical in-
terpretation highly falsifiable since any unsolvable paradox should kill it and it is
interesting to submit it to this awesome test.
Let us, therefore, consider the EPR experiment, for instance, in the old but
clear version where one has just two position operators X1 and X2 and conjugate
momenta P1 and P2. Defining the two commuting operators X = X1 — X2 and
P = P1 + P2 one considers the wave function
and one performs a precise simultaneous measurement of the two commuting ob-
servables X1 and P2. Let us assume that these measurements yield two data D1
and .D2 as read on the corresponding measuring devices.
One can still play the logical game of measurement theory to investigate the
consistency of the process and find out its logical consequences.16 One easily proves,
for instance, the intuitively obvious result.
However, the troublesome and questionable implication standing at the root of the
EPR paradox
"D1 and D2" = "X1 = x1 and P1 = —P2?"
just don't work because there is no consistent logic according to which it could
make sense. So, if one accepts the universal rule of interpretation, there is no hint
of a paradox and, furthermore, there can be no superluminal transfer of information
since there is no logic in which such an information might be consistently formu-
lated. Remembering that information theory is based upon probability theory, one
seems to have been all along fighting about propositions for which no consistent
probability exists.
The dissolution of the EPR paradox in the logical approach looks very simple
and one may wonder wether this simplicity is not in some sense as puzzling as the
old paradox itself.
ACKNOWLEDGMENTS
Laboratoire de Physique Theorique et Haustes Energies is associated with Labora-
toire mock au CNRS.
Some Progress in Measurement Theory 511
REFERENCES
1. Caldeira, A. 0., and A. J. Leggett. "Quantum Tunneling in a Dissipative
System." Ann. Phys. 149 (1983):374.
2. Caldeira, A. 0., and A. J. Leggett. "Quantum Tunnelling in a Dissipative
System (erratum)." Ann. Phys. 153 (1983):44.
3. Feynman, R. P., and F. L. Vernon. "The Theory of a General Quantum Sys-
tem Interacting with a Linear Dissipative System." Ann. Phys. 24
(1963):118.
4. Ginibre, J., and G. Velo. "The Classical Limit of Scattering Theory for Non-
Relativistic Many Boson Systems." Comm. Math. Phys. 66 (1979):37.
5. Glauber, R. J. "Amplifiers, Attenuators and SchrOdinger's Cat." Ann. N.Y.
Acad. Sc. 480 (1986):336.
6. Griffiths, R. J. "Consistent Histories and the Interpretation of Quantum Me-
chanics." J. Stat. Phys. 36 (1984):219.
7. Hagedorn, G. "Semi-Classical Quantum Mechanics." Ann. Phys. 135
(1981):58.
8. Hepp, K., and E. H. Lieb. "Phase Transitions in Reservoir-Driven Open Sys-
tems with Applications to Lasers and Superconductors." He/v. Phys. Ada 46
(1973):573.
9. Hepp, K. "The Classical Limit for Quantum Mechanical Correlation Func-
tions." Comm. Math. Phys. 35 (1974):265.
10. H8rmander, L. "On the Asymptotic Distribution of the Eigenvalues of
Pseudo-Differential Operators." Arkiv for Mat. 17 (1979):297.
11. Hormander, L. The Analysis of Differential Operators, 4 volumes. Berlin:
Springer, 1985.
12. Leggett, A. J. Progr. Theor. Phys. 69 (Suppl) (1980):10.
13. Leggett, A. J. "Quantum Tunneling in the Presence of an Arbitrary Linear
Dissipation Mechanism." Phys. Rev. B30 (1984):1208.
14. Omnes, R. "Logical Reformulation of Quantum Mechanics." J. Stat. Phys.
53, (1988):893, 933, 957.
15. Omnes, R. "Projectors in Semi-Classical Physics." J. Stat. Phys. 57(1/2)
(1989).
16. Omnes, R. "The Einstein-Podolsky-Rosen Problem: A New Solution." Phys.
Lett. A 138 (1989):31.
17. Omnes, R. "From Hilbert Space to Common Sense: The Logical Interpreta-
tion of Quantum Mechanics." Unpublished.
18. Prance, H., T. D. Clark, J. E. Mutton, H. Prance, T. D. Spiller, R. J. Prance
et al. "Localization of Pair Charge States in a SQUID." Phys. Lett. 115A
(1986):125.
19. Prance, R. J., J. E. Mutton, H. Prance, T. D. Clark, A. Widom, and G.
Megaloudis. "First Direct Observation of the Quantum Behaviour of a Truly
Macroscopic Object." Hely. Phys. Ada 56 (1983):789.
20. Prance, R. J., et al. Phys. Lett. 107A (1985):133.
512 Roland Omnes
A B
a priori probabilities, 129, 133 baby universes, 467
abnormal fluctuations, 321, 323, 325 baker's transformation, 263
absolute algorithmic information, 100, band-merging cascades, 223
102 basin of attraction, 160
absorber theory of radiation, 384 Bayesian inference, 234
accessible information, 32 Bayesian probability theory, 92, 387, 392
action at a distance, 384 Bekenstein number, 6-7, 16
adaptation, 139 Bekenstein-Hawking information, 67
adaptive evolution, 151, 185 Bell's inequalities, 41, 376-377, 391, 411,
adaptive landscape, 161 478, 480-482, 484
adaptive systems, 263, 295 Bernoulli flow, 224
Aharonov-Bohm effect, 5, 11 Bernoulli process, 246
Aharonov-Bohm experiment, 11 Bernoulli shift, 263
algorithmic complexity, 76, 118, 130, 152, Bernoulli-Turing machine, 225
193, 226, 228, 321-323, 375 BGS entropy, 359-360, 362, 364
algorithmic compressibility, 63-64 individual, 361
algorithmic entropy, 17, 76, 141, 144, 411 bifurcations, 237
algorithmic independence, 80 big bang, 61-62, 417
algorithmic information, 93, 141, 199, 378 big crunch, 417
absolute, 97, 100 bistability, 292
prior, 100, 107, 112 bit
algorithmic information content, 73, 76 needed vs. available, 14, 16
algorithmic information theory, 96-97, see "it from bit", 3
127, 129-131, 133 black holes, 6, 47-51, 53, 408, 418
algorithmic prior information, 100, 103, entropy of, 6, 47-48, 67
107, 112 Boltzmann-Gibbs-Shannon entropy, 75,
algorithmic randomness, 74-76, 208, 228 359-362, 364
amplification, 10 Boolean function, 158, 174
irreversible act of, 15 canalizing, 166, 169
Anderson localization, 321, 327 Boolean networks, 151, 155, 158-159, 161,
anthropic principle, 9, 63 166
anthropic reasoning, 63 autonomous, 158, 160
approximate probabilities, 428 on/off idealization, 156-157
arithmetic, 65-66 random, 162-163, 174
arrow of time, 62, 405, 407-408, 412, 416, selective adaptation in, 175
418-419, 439 Born's interpretation of the wave func-
quantum mechanical, 416 tion, 508
thermodynamical, 408 boundary, 4, 9
Aspect experiment, 368 boundary conditions, 127-128
asynchronous computation, 279 branch dependence, 450-451
attractors, 160, 290, 292 branches, 440
available information, 129
516 Index
communication channel, 29
C
communication theory, 92, 94
canonical ensemble, 100, 362 complementarity, 4, 11-12, 17
canton, 326 complex adaptive systems, 453
Cannot, 202 complex dynamics, 291, 299
Cannot efficiency formula, 83 complex hierarchical structure, 299
Casimir effect, 392, 401 complex macromolecules, 293
Casimir/Unruh-correlations, 415 complex systems, 152
causality, 396 complexity, 61, 117, 137, 199, 209, 223,
relativistic, 349 263, 299, 420 .
cell differentiation, 165 algorithmic, 226, 228
cellular automata, 142, 262, 279-280, 290, and computational performance,
297, 314, 377 209
1-D unidirectional, 297 Chaitin-Kolmogorov, 228
deterministic dynamics of; 297 conditional, 227
universal, 279 graph, 234
Chaitin-Kolmogorov complexity, 228 latent, 237
channel capacity, 32 physical, 226
chaos, 63, 209, 223, 490 regular language, 235
chaotic systems, 63 complexity catastrophe, 177
Chomsky's computational hierarchy, 229 complexity-entropy diagram, 263
Chomsky's grammar, 223 composite system, 45
Chomsky's hierarchy, 232, 250, 254-255, compressibility, 130, 132-133
265 compressibility of information, 83
Church-Tarski-Turing thesis, 81 computability in physical law, 65
Church-Turing thesis, 65, 73, 81, 225, 229 computable universe, 66
classical Bell's inequalities, 477, 479-484 computation, 223
classical ideal gas, 351 computational complexity, 208
classical limit, 498, 501 computational ecosystems, 208-209
classical logic, 503 computational ergodic theory, 264
classical spacetime, 459, 461, 466 computational limits, 67
classicity, 448 computational time/space complexity,
Clausius principle, 362 141
clock spins, 277 computational universality, 139
coarse graining, 411 computational velocity, 283
coarse-grained density matrix, 447 computer
coarse-grained sets of histories, 442 and consciousness, 15
coarse-graining, 463 evolution of structure, 16
code, 30 conditional algorithmic information, 98,
code length, 118-124 100, 106
coding, 30, 32, 36, 74, 95, 120 average, 101
coding theory, 78, 83, 85 conditional complexity, 227
coevolution, 151, 185 conditional entropy, 233
coguizability, 66 conditional machine table, 294
coherence, 291 conditional statistical information, 96
coherent states, 461, 501 average, 97
in quantum cosmology, 464 conditional switching dynamics, 294
collapse of the wave function, 405-407, consciousness, 5, 15
413-416 context-free Lindenmayer systems, 223
collective behavior, 302 context-sensitive languages, 257
communication, 15-16 continuum, 378
Index 517
information (cont'd.)
H loss of, 463
H-theorem, 360 metric, 80
Hartle-Hawking path integral, 68 mutual, 30-31, 33, 141, 144, 244
Hawking-Bekenstein formula, 67 mutual algorithmic, 99, 101
Hawking formula, 51 mutual statistical, 97
Hawking radiation, 50 physics, 408
heat capacity, 248 prior, 91, 94, 100, 103, 107, 112
Heisenberg spins, 337 processing, 289, 299
hidden variables, 275 processing rate, 68
hierarchical structure, 291 Shannon, 236
history, 498 Shannon's theory of, 74, 80
horizon, 54 statistical, 93-94
Huffman's coding, 95-96, 195 storage device, 294
human brain, 62, 64-68, 331 transmission rate, 35
Huygens' principle, 56 visual, 331
hypercube architectures, 290 information-carrying sequences, 199
information-theoretic triangle inequality,
17
I information theory, 3, 5, 8, 11-12, 17, 29,
IGUS, 74-75, 453 96-97, 127, 129-131, 133, 225, 232, 332,
incomplete information, 358 346
inderterminism initial condition, 426
quantum, 413-414 instantaneous code, 95, 100-101
indexed grammar, 256 interference fringes, 6
inequalities, 99-108 intermittency, 262, 319, 321, 325-327
inertial observer, 54 interpretation of formal logic, 498
inflation, 460 interpretation of quantum mechanics, 33,
inflationary universe, 460 495
information, 193-194, 196-197, 223, 359, irreversibility, 145, 357, 452, 484, 486-487
382, 390, 405, 508 quantum, 478
absolute algorithmic, 100, 102 Ising model, 337
accessible, 32 1-D kinetic, 297
algorithmic, 93, 97, 100, 107, 112, Ising spin system, 262
141, 199, 378 Ising spins, 337
available, 129 isothermal expansion, 353
Bekenstein-Hawldng, 67 it from bit, 3, 5, 7-8, 11-12, 16
compressibility of, 83
conditional algorithmic, 98, 100-
101, 106
conditional statistical, 96-97 Jaynes' maximum entropy principle, 360
correlation, 33 joint algorithmic information, 98, 101
dissipation of, 459-460 joint algorithmic randomness, 78
distance, 80 joint statistical information, 97
dynamics, 316 Josephson junctions, 326
economy of, 45 Josephson weak link, 501
free, 234
genetic, 194
Gibbs-Shannon statistical, 92, 94, K
96 K-flows, 264, 503, 508
joint algorithmic, 98, 101 K-systems, 509
520 Index
Kelvin's efficiency limit, 112-113 Lyapunov exponent, 260, 320, 324, 327
Kholevo's theorem, 29, 31-32, 34-36
Kolmogorov entropy, 411, 508
Kolmogorov-Sinai entropy, 321 M
Kraft inequality, 85, 95, 120 machine table, 293
macromolecules
complex, 293
L
macroscopic quantum effects, 478
labeled directed graphs, 230 magnetic flux, 5
Lamb-Retherford experiment, 394 magnetometer
Lamb shift, 384, 393-394, 396-397, 400 and it from bit, 6
in classical mechanics, 395 Manneville-Pomeau map, 320
Landauer's principle, 113 many-worlds interpretation, 33, 472
Laplacean demon, 414 Markov chain, 246, 325-326
Larmor radiation law, 396 Markov partition, 241
laser pulse, 292 Markov process, 509
latent complexity, 223, 237 master equations, 409-410, 413, 415
lattice dynamical systems, 262 mathematics, 65-66, 68
lattice gas, 314 unreasonable effectiveness of, 64
law of initial conditions, 68 maximal device, 359
laws, 62-63, 66, 68 maximal sets of decohering histories, 445
Larmor radiation, 396 maximum entropy, 124, 234, 389-390
mechanics, 65 maximum uncertainty/entropy principle,
nature, 303 360
physics, 9, 65, 67, 301 Maxwell electrodynamics, 10
learning theory Maxwell's demon, 73-74, 81-82, 85, 93,
formal, 230 106, 109-112, 405-406
Levy distribution, 324 meaning, 13-16
lexicographic tree, 79 measurement, 231, 290, 362-365, 451
light cone, 346, 350 measurement entropy, 358-359, 365
Lindenmayer systems, 223, 255 measurement problem, 363-364
Lionville equation, 351, 409 measurement process, 291, 406
quantum, 412 measurement situations, 451
local entropies, 414 measurement theory, 506
local measurements, 39 measurement/preparation process, 358
local synchronization, 279 measuring device, 358
local transition table, 297 measuring instrument, 230
locality, 64 membrane, 201
localized charge-transfer excitations, 291 semipermeable, 347
logic metric complexity, 236
structure of, 15 metric entropy, 225, 236
logical depth, 142, 229 microcanonical ensemble, 100, 104-105
logical functions, 295 entropy of, 105
logical interpretation, 495-496 Mind Projection Fallacy, 385
logistic map, 231 miniaturization, 289
lognormal distribution, 212 minimal description, 79
loop minimal program, 97, 129, 152
observer-participancy in, 8-9 minimum description length criterion,
Lorentz invariance, 312 121
Lorentz-invariant model of diffusion, 313 Misiurewicz parameters, 241, 262, 264
loss of information, 463 mixed states, 358
Index 521
Q R
Qb virus, 204 radiation
quantization effects, 289 relict microwave, 14
quantum, 4 random Boolean networks, 155, 162, 174
also see photon, 14 randomness, 224
quantum channels, 32 real amplitudes, 44
quantum-classical correspondence, 477 reality is theory, 13
quantum communication, 29 record, 451
quantum computation, 273 recurrent states, 234
quantum computer, 290 reduction
quantum correlations, 411, 478 in quantum measurements, 362
quantum cosmology, 63-64, 426, 459-461, of wave function, 406
463-467 reduction of the wave function, 406
quantum-dots, 289 regular language complexity, 235
quantum electrodynamics, 383-384 regular languages, 232
Index 523
T w
Teich, W.G., 292 Wald, R.M., 55
Telegdi, V., 463, 467 Walden, R.W., 147
Thorne, Kip, 6 Wang, Xiao-Jing, 322, 325-326
Tipler, Frank, 68 Weaver, W., 85
Toffoli, Tommaso, 276, 310, 314-315 Weinberg, Steven, 392
Tomonaga, 370 Weisbuch, Gerard, 170
Tsirelson, B.S., 478, 481 Weisskopf, V.F., 394
Welton, T.A., 394, 397
Weyl, Hermann, 4, 9
U Wheeler, John A., 10, 13, 67, 367, 377,
Uffink, J.B.M., 33 382, 394, 430, 463
Unruh, W.G., 11, 54, 56, 431, 454, 463 White, Morton, 15
Wigner, Eugene, 64, 376, 386, 431, 463
Wold, H.O.A., 224, 263
V Wolfram, Stephen, 142, 229, 262
VanVechten, D., 430 Wootters, William, 5, 12, 17
Velo, G., 502 Wright, Chauncey, 15
Venn, John, 388
Vernon, J.R., 444
Vilenkin, A., 463
von Mises, R., 388 Zee, A., 340
von Neumann, J., 346-348, 353, 358, 371, Zeh, H.D., 11, 430, 443, 449, 463, 465-466
497 Zurek, Wojciech H., 5-6, 17, 93, 95, 98-
von Schelling, F.W.J., 15 99, 104-105, 110, 112, 129, 205, 229, 430,
443, 463, 483
Zvonkin, A.K., 100
Zwanzig, 409, 411, 413