Wojciech H. Zurek - Complexity, Entropy and The Physics of Information-Westview Press (1990)

COMPLEXITY, ENTROPY,
AND THE PHYSICS OF

INFORMATION
THE PROCEEDINGS OF THE
WORKSHOP ON COMPLEXITY, ENTROPY,
AND THE PHYSICS OF INFORMATION
HELD MAY-JUNE, 1989
IN SANTA FE, NEW MEXICO
Wojciech H. Zurek, Editor

Theoretical Division
Los Alamos National Laboratory
Los Alamos, NM 87545
Volume VIII
Santa Fe Institute
Studies in the Sciences of Complexity
ARP Advanced Book Program
CRC Press
CRS
CP' Taylor & Francis Group
Boca Raton London New York
CRC Press is an imprint of the

Taylor & Francis Group, an informa business
Director of Publications, Santa Fe Institute: Ronda K. Buder-Villa
Technical Assistant, Santa Fe Institute: Della L. Ulibarri
First published 1990 by Westview Press

Published 2018 by CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
CRC Press is an imprint of the Taylor & Francis Group, an informa business
Copyright © 1990 by Taylor & Francis Group LLC
No claim to original U.S. Government works
This book contains information obtained from authentic and highly regarded sources.
Reason-able efforts have been made to publish reliable data and information, but the
author and publisher cannot assume responsibility for the validity of all materials or
the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copy-
right holders if permission to publish in this form has not been obtained. If any copy-
right material has not been acknowledged please write and let us know so we may rec-
tify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other
means, now known or hereafter invented, including photocopying, microfilming, and
recording, or in any information storage or retrieval system, without written permis-
sion from the publishers.
For permission to photocopy or use material electronically from this work, please
access www. copyright.com (http://www.copyright.com/) or contact the Copyright
Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organiza-tion that provides licenses and regis-
tration for a variety of users. For organizations that have been granted a photocopy li-
cense by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trade-
marks, and are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Library of Congress Cataloging-in-Publication Data

Complexity, entropy, and the physics of information: proceedings of the SFI
Workshop entitled, "complexity, entropy, and the physics of information,
held May 29 to June 10, 1989/edited by Wojciech Zurek.
p. cm.—(Santa Fe Institute studies in the sciences of complexity.
Proceedings; v. 8)
Includes bibliographical references and index.
1. Physical measurements--Congresses. 2. Computational complexity
—Congresses. 3. Entropy—Congresses. 4. Quantum theory—Congresses.
I. Zurek, Wojciech Hubert, 1951—. II. Series:
Proceedings volume in the Santa Fe Institute studies in the sciences of
complexity; v. 8.
QC39.S48 1991 530.1'6—dc20 90-643
ISBN 0-201-51509-1.—ISBN 0-201-51506-7 (pbk.)
This volume was typeset using TEXtures on a Macintosh II computer.
ISBN 13: 978-0-201-51506-0 (pbk)
About the Santa Fe Institute
The Santa Fe Institute (SFI) is a multidisciplinary graduate research and teach-

ing institution formed to nurture research on complex systems and their simpler
elements. A private, independent institution, SFI was founded in 1984. Its pri-
mary concern is to focus the tools of traditional scientific disciplines and emerging
new computer resources on the problems and opportunities that are involved in
the multidisciplinary study of complex systems—those fundamental processes that
shape almost every aspect of human life. Understanding complex systems is critical
to realizing the full potential of science, and may be expected to yield enormous
intellectual and practical benefits.
All titles from the Santa Fe Institute Studies

in the Sciences of Complexity series will carry
this imprint which is based on a Mimbres
pottery design (circa A.D. 950-1150), drawn
by Betsy Jones.
Santa Fe Institute Studies in the Sciences of Complexity
PROCEEDINGS VOLUMES
Volume Editor Title

I David Pines Emerging Syntheses in Science, 1987
II Alan S. Perelson Theoretical Immunology, Part One, 1988
III Alan S. Perelson Theoretical Immunology, Part Two, 1988
IV Gary D. Doolen et al. Lattice Gas Methods of Partial
Differential Equations, 1989
V Philip W. Anderson et al. The Economy as an Evolving. Complex
System, 1988
VI Christopher G. Langton Artificial Life: Proceedings of
an Interdisciplinary Workshop on the
Synthesis and Simulation of Living
Systems, 1988
VII George I. Bell & Computers and DNA, 1989
Thomas G. Marr
VIII Wojciech H. Zurek Complexity, Entropy, and the Physics
of Information, 1990
IX Alan S. Perelson Molecular Evolution on Rugged
Stuart A. Kauffman Landscapes: Proteins, RNA and the
Immune System, 1990
LECTURES VOLUMES
Volume Editor Title

I Daniel L. Stein Lectures in the Sciences of Complexity,
1988
II Erica Jen 1989 Lectures in Complex Systems
Santa Fe Institute Editorial Board
August 1989
L. M. Simmons, Jr., Chair

Executive Vice President, Santa Fe Institute
Dr. Robert McCormick Adams

Secretary, Smithsonian Institute
Professor Philip W. Anderson
Department of Physics, Princeton University
Professor Kenneth J. Arrow
Department of Economics, Stanford University
Professor W. Brian Arthur
Dean & Virginia Morrison Professor of Population Studies and Economics, Food
Research Institute, Stanford University
Dr. George I. Bell
Leader, Theoretical Division, Los Alamos National Laboratory
Dr. David K. Campbell
Director, Center for Nonlinear Studies, Los Alamos National Laboratory
Dr. George A. Cowan
President, Santa Fe Institute and Senior Fellow Emeritus, Los Alamos National
Laboratory
Professor Marcus W. Feldman
Director, Institute for Population & Resource Studies, Stanford University
Professor Murray Gell-Mann
Division of Physics & Astronomy, California Institute of Technology
Professor John H. Holland
Division of Computer Science & Engineering, University of Michigan
Dr. Bela Julesz
Head, Visual Perception Research, AT& T Bell Laboratories
Professor Stuart Kauffman
School of Medicine, University of Pennsylvania
Dr. Alan Perelson
Theoretical Division, Los Alamos National Laboratory
Professor David Pines
Department of Physics, University of Illinois
Contributors to This Volume
David Z. Albert, Columbia University
J. W. Barrett, The University, Newcastle upon Tyne, UK
Charles H. Bennett, IBM Thomas J. Watson Research Center
Carlton M. Caves, University of Southern California
James P. Crutchfield, University of California, Berkeley
P. C. W. Davies, The University, Newcastle upon Tyne, UK
Murray Gell-Mann, California Institute of Technology
Jonathan J. Halliwell, University of California, Santa Barbara
James B. Hartle, University of California, Santa Barbara
Tad Hogg, Xerox Palo Alto Research Center
E. T. Jaynes, Washington University
Stuart A. Kauffman, University of Pennsylvania and Santa Fe Institute
L. A. Khalfin, Steklov Mathematical Institute of the Academy of Sciences, USSR
Dilip K. Kondepudi, Wake Forest University
Seth Lloyd, California Institute of Technology
G. Mahler, Institut fur Theoretische Physik, Universitiit Stuttgart, FRG
Norman Margolus, MIT Laboratory for Computer Science
V. F. Mukhanov, Institute for Nuclear Research, USSR
Roland Omnes, Laboratoire de Physique Thiorique et Sautes Energies, Universitd
de Paris-Sud, France
M. Hossein Partovi, California State University, Sacramento
Asher Peres, Technion, Israel Institute of Technology
J. Rissanen, IBM Almaden Research Center
0. E. Rossler, Institute for Physical and Theoretical Chemistry, University of
Tibingen, FRG
Benjamin Schumacher, Kenyon College
Shin Takagi, Tohoku University, Japan
W. G. Teich, Institut fir Theoretische Physik, Universitit Stuttgart, FRG
Tonunaso Toffoli, MIT Laboratory for Computer Science
Xiao-Jing Wang, Center for Studies in Statistical Mechanics, University of Texas,
Austin
John Archibald Wheeler, Princeton University and University of Texas, Austin
C. H. Woo, Center for Theoretical Physics, University of Maryland
William K. Wootters, Santa Fe Institute; Center for Nonlinear Studies and
Theoretical Division, Los Alamos National Laboratory; and Williams College
Karl Young, University of California, Santa Cruz
A. Zee, University of California, Santa Barbara
H. D. Zeh, Institute fur Theoretische Physik, Universat Heidelberg, FRG
W. H. Zurek, Los Alamos National Laboratory and Santa Fe Institute
Foreword
COMPLEXITY, ENTROPY, AND THE PHYSICS OF

INFORMATION-A MANIFESTO
The specter of information is haunting sciences. Thermodynamics, much of the foun-
dation of statistical mechanics, the quantum theory of measurement, the physics of
computation, and many of the issues of the theory of dynamical systems, molecu-
lar biology, genetics and computer science share information as a common theme.
Among the better established, but still mysterious, hints about the role of informa-
tion are:
n A deep analogy between thermodynamic entropy and Shannon's information-
theoretic entropy. Since the introduction of Maxwell's Demon and, particularly,
since the celebrated paper of Szilard and even earlier discussions of Smolu-
chowski, the operational equivalence of the gain of information and the decrease
of entropy has been widely appreciated. Yet, the notion that a subjective quan-
tity such as information could influence "objective" thermodynamic properties
of the system remains highly controversial.
Complexity, Entropy, and the Physics of Information, SFI Studies in

the Sciences of Complexity, vol. VIII, Ed. W. H. Zurek, Addison-Wesley, 1990 VII
viii Wojciech H. Zurek
It is, however, difficult to deny that the process of information gain can be
directly tied to the ability to extract useful work. Thus, questions concerning ther-
modynamics, the second law, and the arrow of time have become intertwined with
a half-century-old puzzle, that of the problem of measurements in quantum physics.
n Quantum measurements are usually analyzed in abstract terms of wave func-
tions and hamiltonians. Only very few discussions of the measurement problem
in quantum theory make an explicit effort to consider the crucial issue—the
transfer of information. Yet obtaining knowledge is the very reason for mak-
ing a measurement. Formulating quantum measurements and, more generally,
quantum phenomena in terms of information should throw a new light on the
problem of measurement, which has become difficult to ignore in light of new
experiments on quantum behavior in macroscopic systems.
The distinction between what is and what is known to be, so clear in classi-
cal physics, is blurred, and perhaps does not exist at all on a quantum level. For
instance, energetically insignificant interactions of an object with its quantum en-
vironment suffice to destroy its quantum nature. It is as if the "watchful eye" of
the environment "monitoring" the state of the quantum system forced it to behave
in an effectively classical manner. Yet, even phenomena involving gravity, which
happen on the most macroscopic of all the scales, bear the imprint of quantum
mechanics.
In fact it was recently suggested that the whole Universe—including configura-
tions of its gravitational field—may and should be described by means of quantum
theory. Interpreting results of the calculations performed on such a "Wavefunction
of the Universe" is difficult, as the rules of thumb usually involved in discussions
of experiments on atoms, photons, and electrons assume that the "measuring ap-
paratus" as well as "the observer" are much larger than the quantum system. This
is clearly not the case when the quantum system is the whole Universe. Moreover,
the transition from quantum to classical in the early epochs of the existence of the
Universe is likely to have influenced its present appearance.
n Black hole thermodynamics has established a deep and still largely mysteri-
ous connection between general relativity, quantum, and statistical mechanics.
Related questions about the information capacity of physical systems, funda-
mental limits on the capacity of communication channels, the origin of entropy
in the Universe, etc., are a subject of much recent research.
The three subjects above lie largely in the domain of physics. The following is-
sues forge connections between the natural sciences and the science of computation,
or, rather, the subject of information processing regarded in the broadest sense of
the word.
n Physics of computation explores limitations imposed by the laws of physics
on the processing of information. It is now established that both classical and
quantum systems can be used to perform computations reversibly. That is,
computation can be "undone" by running the computer backwards. It appears
Foreword iX
also conceivable that approximately reversible computer "chips" can be realized

in practice. These results are of fundamental importance, as they demonstrate
that, at least in principle, processing of information can be accomplished at
no thermodynamic cost. Moreover, such considerations lead one to recognize
that it is actually the erasure of the information which results in the increase
of entropy.
The information which is being processed by the computer is a concrete
"record," a definite sequence of symbols. Its information content cannot be repre-
sented adequately in terms of Shannon's probabilistic definition of information. One
must instead quantify the information content of the specific, well-known "record"
in the memory of the computer—and not its probability or frequency of occurrence,
as Shannon's formalism would demand. Fortunately, a relatively new development—
a novel formulation of the information theory—has been already put forward.
n Algorithmic randomness—an alternative definition of the information content
of an object based on the theory of computation rather than on probabilities—
was introduced more then two decades ago by Solomonoff, Kolmogorov, and
Chaitin. It is equal to the size of the shortest message which describes this
object. For instance, a string of 105 0's and l's:
01010101010101...
can be concisely described as "5 • 104 01 pairs." By contrast, no concise de-

scription can be found for a typical, equally long string of 0's and l's generated
by flipping a coin. To make this definition more rigorous, the universal Turing
Machine—a "generic" universal computer—is usually considered to be the "ad-
dressee" of the message. The size of the message is then equal to the length—in
the number of bits—of the shortest program which can generate a sufficiently
detailed description (for example, a plot) of the object in question.
It is tempting to suggest that physical entropy—the quantity which determines
how much work can be extracted from a physical system—should take into account
its algorithmic randomness. This suggestion can be substantiated by detailed dis-
cussions of examples of computer-operated engines as well as by results concerning
the evolution of entropy and algorithmic randomness in the course of measurements.
It provides a direct link between thermodynamics, measurements, and the theory
of computation. Moreover, it is relevant to the definition of complexity.
n Complexity, its meaning, its measures, its relation to entropy and information,
and its role in physical, biological, computational, and other contexts have
become an object of active research in the past few years.
X Wojciech H. Zurek
PROCEEDINGS
This book has emerged from a meeting held during the week of May 29 to June
2, 1989, at St. John's College in Santa Fe under the auspices of the Santa Fe
Institute. The (approximately 40) official participants as well as equally numerous
"groupies" were enticed to Santa Fe by the above "manifesto." The book—like
the "Complexity, Entropy and the Physics of Information" meeting—explores not
only the connections between quantum and classical physics, information and its
transfer, computation, and their significance for the formulation of physical theories,
but it also considers the origins and evolution of the information-processing entities,
their complexity, and the manner in which they analyze their perceptions to form
models of the Universe. As a result, the contributions can be divided into distinct
sections only with some difficulty.
Indeed, I regard this degree of overlapping as a measure of the success of the
meeting. It signifies consensus about the important questions and on the antic-
ipated answers: they presumably lie somewhere in the "border territory," where
information, physics, complexity, quantum, and computation all meet.
ACKNOWLEDGMENTS
I would like to thank the staff of the Santa Fe Institute for excellent (and friendly)
organizational support. In particular, Ginger Richardson was principally responsible
for letting "the order emerge out of chaos" during the meeting. And somehow Ronda
Butler-Villa managed the same feat with this volume.
I would like to gratefully acknowledge the Santa Fe Institute, the Air Force
Office for Scientific Research, and the Center for Nonlinear Studies, Los Alamos
National Laboratory, for the financial (and moral) support which made this meeting
possible.
—Wojciech H. Zurek
Los Alamos National Laboratry
and the Santa Fe Institute
Contents
Foreword
Wojciech H. Zurek Ai
I Physics of information 1
Information, Physics, Quantum: The Search for Links
John Archibald Wheeler 3
Information from Quantum Measurements

Benjamin Schumacher 29
Local Accessibility of Quantum States

William K. Wootters 39
The Entropy of Black Holes

V. F. Mukhanov 47
Some Simple Consequences of the Loss of Information in a

Spacetime with a Horizon
Shin Takagi 53
Why is the Physical World so Comprehensible?

P. C. W. Davies 61
II Laws of Physics and Laws of Computation 71
Algorithmic Information Content, Church-Turing Thesis,

Physical Entropy, and Maxwell's Demon
W. H. Zurek 73

the Sciences of Complexity, vol. VIII, Ed. W. H. Zurek, Addison-Wesley, 1990 xi
Xli Contents
Entropy and Information: How Much Information is

Needed to Assign a Probability?
Carlton M. Caves 91
Complexity of Models
J. Rissanen 117
Laws and Boundary Conditions

C. H. Woo 127
How to Define Complexity in Physics, and Why

Charles H. Bennett 137
Ill Complexity and Evolution 149

Requirements for Evolvability in Complex Systems:
Orderly Dynamics and Frozen Components
Stuart A. Kauffman 151
Valuable Information
Seth Lloyd 193
Non-Equilibrium Polymers, Entropy, and Algorithmic
Information
Dilip K. Kondepudi 199
The Dynamics of Complex Computational Systems

Tad Hogg 207
Computation at the Onset of Chaos

James P. Crutchfield and Karl Young 223
IV Physics of Computation 271
Parallel Quantum Computation

Norman Margolus 273
Information Processing at the Molecular Level:

Possible Realizations and Physical Constraints
W. G. Teich and G. Mahler 289
How Cheap Can Mechanics' First Principles Be?

Tommaso Toffoli 301
Intermittent Fluctuations and Complexity

Xiao-Jing Wang 319
Contents
Information Processing in Visual Perception

A. Zee 331
V Probability, Entropy, and Quantum 343
Thermodynamic Constraints on Quantum Axioms

Asher Peres 345
Entropy and Quantum Mechanics

M. Hossein Partovi 357
Einstein Completion of Quantum Mechanics Made
Falsifiable
0. E. R5ssler 367
Quantum Mechanics and Algorithmic Complexity

J. W. Barrett 375
Probability in Quantum Theory

E. T. Jaynes 381
Quantum Measurements and Entropy

H. D. Zeh 405
VI Quantum Theory and Measurement 423
Quantum Mechanics in the Light of Quantum Cosmology

Murray Gell-Mann and James B. Hartle 425
Information Dissipation in Quantum Cosmology and the
Emergence of Classical Spacetime
Jonathan J. Halliwell 459
The Quantum Mechanics of Self-Measurement

David Z. Albert 471
The Quantum-Classical Correspondence in Light of Classi-
cal Bell's and Quantum Tsirelson's Inequalities
L. A. Khalfin 477
Some Progress in Measurement Theory: The Logical Inter-
pretation of Quantum Mechanics
Roland Omnes 495
Indices
513
I Physics of Information
John Archibald Wheeler
Physics Departments, Princeton University, Princeton, NJ 08544 and University of Texas
at Austin, TX 78712
Information, Physics, Quantum:

The Search for Linksw
This report reviews what quantum physics and infOrmation theory have
to tell us about the age-old question, "How come existence?" No escape is
evident from four conclusions: (1) The world cannot be a giant machine,
ruled by any pre-established continuum physical law. (2) There is no such
thing at the microscopic level as space or time or spacetime continuum.
(3) The familiar probability function or functional, and wave equation
or functional wave equation, of standard quantum theory provide mere
continuum idealizations and by reason of this circumstance conceal the
information-theoretic source from which they derive. (4) No element in the
description of physics shows itself as closer to primordial than the elemen-
tary quantum phenomenon, that is, the elementary device-intermediated
act of posing a yes-no physical question and eliciting an answer or, in brief,
the elementary act of observer-participancy. Otherwise stated, every phys-
ical quantity, every it, derives its ultimate significance from bits, binary
yes-or-no indications, a conclusion which we epitomize in the phrase, it
from bit.
113 Copyright 0 1990 by John Archibald Wheeler.
Complexity, Entropy, and the Physics of Information, SFI Studies in the

Sciences of Complexity, vol. VIII, Ed. W. H. Zurek, Addison-Wesley, 1990 3
4 John Archibald Wheeler
1. QUANTUM PHYSICS REQUIRES A NEW VIEW OF REALITY

Beyond the revolution in outlook that Kepler,62 Newton,89 and Einstein37 brought
us,121 and beyond the story of life74'29'12° that evolution forced upon an unwilling
world, the ultimate shock to preconceived ideas lies ahead, be it a decade hence,
a century, or a millenium. The overarching principle of 20th-century physics, the
quantum89—and the principle of complementarityl3l that is the central idea of the
quantum—leaves us no escape, Niels Bohr tells us,19 from "a radical revision of our
attitude as regards physical reality" and a "fundamental modification of all ideas
regarding the absolute character of physical phenomena." Transcending Einstein's
summons36 of 1908, "This quantum business is so incredibly important and difficult
that everyone should busy himself with it," Bohr's modest words direct us to the
supreme goal: DEDUCE the quantum from an understanding of existence.
How do we make headway toward a goal so great against difficulties so large?
The search for understanding presents to us three questions, four no's, and five
clues:
n Three questions
How come existence?
How come the quantum?
How come the "one world" out of many observer-participants?
n Four no's
No tower of turtles,
No laws,
No continuum,
No space, no time.
n Five clues
The boundary of a boundary is zero.
No question? No answer!
121The appendix of Kepler's Book 5 contains one side, the publications of the English physician
and thinker Robert Fludd (1574-1637) the other side, of a great debate, analyzed by Wolfgang
Pauli.85 Totally in contrast to Fludd's concept of intervention from on high63 was Kepler's guid-
ing principle, Ubi materia, ibi geometria—where there is matter, there is geometry. It was not
directly from Kepler's writings, however, that Newton learned of Kepler's three great geometry-
driven findings about the motions of the planets in space and in time, but from the distillation of
Kepler offered by Thomas Streete.166
JGST157 offers a brief and accessible summary of Einstein's 1915 and still standard geometro-
dynamics which capitalizes on Elie Cartan's appreciation of the central idea of the theory: the
boundary of a boundary is zero.
[31See Bohr.17 The mathematics of complernentarity I have not been able to discover stated any-
where more sharply, more generally and earlier than in H. Wey1,121 in the statement that the
totality of operators for all the physical quantities of the system in question form an irreducible
set.
Information, Physics, Quantum: The Search for Links 5
The super-Copernican principle.

"Consciousness."
More is different.
2. "IT FROM BIT" AS A GUIDE IN THE SEARCH FOR LINK

CONNECTING PHYSICS, QUANTUM, AND INFORMATION
In default of a tentative idea or working hypothesis, these questions, no's, and
clues—yet to be discussed—do not move us ahead. Nor will any abundance of
clues assist a detective who is unwilling to theorize how the crime was commit-
ted! A wrong theory? The policy of the engine inventor, John Kris, reassures us,
"Start her up and see why she don't go!" In this spirit64,75,77,334,124-155 I, like other
searchers,12,32,46,78 attempt formulation after formulation of the central issues, and
here present a wider overview, taking for a working hypothesis the most effective one
that has survived this winnowing: It from bit. Otherwise put, every it—every par-
ticle, every field of force, even the spacetime continuum itself—derives its function,
its meaning, its very existence entirely=even if in some contexts indirectly—from
the apparatus-elicited answers to yes-or-no questions, binary choices,112 bits.
It from bit symbolizes the idea that every item of the physical world has at
bottom—at a very deep bottom, in most instances—an immaterial source and ex-
planation; that which we call reality arises in the last analysis from the posing of
yes-no questions and the registering of equipment-evoked responses; in short, that
all things physical are information-theoretic in origin and this is a participatory
universe.
Three examples may illustrate the theme of it from bit. First, the photon.
With a polarizer over the distant source and an analyzer of polarization over the
photodetector under watch, we ask the yes-or-no question, "Did the counter register
a click during the specified second?" If yes, we often say "A photon did it." We
know perfectly well that the photon existed neither before the emission nor after the
detection. However, we also have to recognize that any talk of the photon "existing"
during the intermediate period is only a blown-up version of the raw fact, a count.
The yes or no that is recorded constitutes an unsplitable bit of information. A
photon, Wootters and Zurek demonstrate,163,1" cannot be cloned.
As a second example of it from bit, we recall the Aharonov-Bohm scheme2 to
measure a magnetic flux. Electron counters stationed off to the right of a doubly
slit screen give yes-or-no indications of the arrival of an electron from the source
located off to the left of the screen, both before the flux is turned on and afterward.
That flux of magnetic lines of force finds itself embraced between—but untouched
by—the two electron beams that fan out from the two slits. The beams interfere.
The shift in interference fringes between field off and field on reveals the magnitude
of the flux,
(phase change around perimeter of the included area)

= 2r x (shift of interference pattern, measured in number of fringes) (1)
= (electron charge) x (magnetic flux embraced)/hc .
Here h = 1.0546x 10-27gcm2 /s is the quantum in conventional units, or in geometric

units,77,157—where both time and mass are measured in the units of length—h =
he = 2.612 x 10--66cm2 = the square of the Planck length, 1.616 x 10-33cm = what
we hereafter term the Planck area.
Not only in electrodynamics but also in geometrodynamics and in every other
gauge-field theory, as Anandan, Aharonov, and others point out,3,4 the difference
around a circuit in the phase of an appropriately chosen quantum-mechanical prob-
ability amplitude provides a measure of the field. Here again the concept of it from
bit applies.156 Field strength or spacetime curvature reveals itself through a shift
of interference fringes, fringes that stand for nothing but a statistical pattern of
yes-or-no registrations.
When a magnetometer reads that it which we call a magnetic field, no reference
at all to a bit seems to show itself. Therefore we look closer. The idea behind the
operation of the instrument is simple. A wire of length I carries a current i through a
magnetic field B that runs perpendicular to it. As a consequence the piece of copper
receives in the time t a transfer of momentum p in a direction z perpendicular to
the directions of the wire and of the field,
p = Blit = (flux per unit z)

x (charge, e, of the elementary carrier of current) (2)
x (number, N, of carriers that pass in the time t) .
This impulse is the source of the force that displaces the indicator needle of the
magnetometer and gives us an instrument reading. We deal with bits wholesale
rather than bits retail when we run the fiducial current through the magnetometer
coil, but the definition of fields founds itself no less decisively on bits.
As a third and final example of it from bit, we recall the wonderful quantum
finding of Bekenstein,9,1°,11—a totally unexpected denouement of the earlier clas-
sical work of Penrose," Christodoulou,26 and Ruffini27—refined by Hawking,52,53
that the surface area of the horizon of a blackhole, rotating or not, measures the
entropy of the blackhole. Thus this surface area, partitioned in the imagination
(Figure 1) into domains, each of the size 4h loge 2, that is, 2.77... times the Planck
area, yields the Bekenstein number, N; and the Bekenstein number, so Thorne and
Zurek explain,173 tells us the number of binary digits, the number of bits, that
would be required to specify in all detail the configuration of the constituents out
of which the blackhole was put together. Entropy is a measure of lost information.
To no community of newborn outside observers can the blackhole be made to reveal
out of which particular one of the 2N configurations it was put together. Its size,
an it, is fixed by the number, N, of bits of information hidden within it.
The quantum, h, in whatever current physics formula it appears, thus serves as
a lamp. It lets us see the horizon area as information lost, understand wave number
of light as photon momentum, and think of field flux as bit-registered fringe shift.
Giving us its as bits, the quantum presents us with physics as information.
How come a value for the quantum so small as h = 2.612 x 10"66cm2? As
well ask why the speed of light is so great as c = 3 x 101°cm/s! No such constant
as the speed of light ever makes an appearance in a truly fundamental account
FIGURE 1 Symbolic representation of the "telephone number" of the particular one

of the 2N conceivable, but by now indistinguishable, configurations out of which this
particular blackhole, of Bekenstein number N and horizon area 4Nh loge 2, was put
together. Symbol, also, in a broader sense, of the theme that every physical entity,
every it, derives from bits. Reproduced from JGST, p. 220; reprinted by permission of
Freeman Pub. Co.
of special relativity or Einstein geometrodynamics, and for a simple reason: Time

and space are both tools to measure interval. We only then properly conceive them
when we measure them in the same units.77,157 The numerical value of the ratio
between the second and the centimeter totally lacks teaching power. It is a historical
accident. Its occurrence in equations obscured for decades one of nature's great
simplicities. Likewise with h! Every equation that contains an h floats a banner, "It
from bit." The formula displays a piece of physics that we have learned to translate
into information-theoretic terms. Tomorrow we will have learned to understand
and express all of physics in the language of information. At that point we will
revalue h = 2.612 x 10-66cm2—as we downgrade c = 3 x 1019cm/s today—from
constant of nature to artifact of history, and from foundation of truth to enemy of
understanding.
3. FOUR NO'S
To the question "How come the quantum?" we thus answer, "Because what we call
existence is an information-theoretic entity." But how come existence? Its as bits,
yes; and physics as information, yes; but whose information? How does the vision
of one world arise out of the information-gathering activities of many observer-
participants? In the consideration of these issues we adopt for guidelines four no's.
FIRST NO
"No tower of turtles," advised William James. Existence is not a globe supported
by an elephant, supported by a turtle, supported by yet another turtle, and so
on. In other words, no infinite regress. No structure, no plan of organization, no
framework of ideas underlaid by another structure or level of ideas, underlaid by
yet another level, and yet another, ad infinitum, down to bottomless blackness. To
endlessness no alternative is evident but a loop,I4] such as: Physics gives rise to
observer-participancy; observer-participancy gives rise to information; and infor-
mation gives rise to physics.
Is existence thus built99 on "insubstantial nothingness"? Rutherford and Bohr
made a table no less solid when they told us it was 99.9... percent emptiness.
Thomas Mann may exaggerate when he suggestel that "... we are actually bringing
about what seems to be happening to us," but Leibniz69 reassures us that "although
the whole of this life were said to be nothing but a dream and the physical world
nothing but a phantasm, I should call this dream or phantasm real enough if, using
reason well, we were never deceived by it."
RI See MTW," page 1217, and Wheeler.'"

SECOND NO
No laws. "So far as we can see today, the laws of physics cannot have existed from
everlasting to everlasting. They must have come into being at the big bang. There
were no gears and pinions, no Swiss watchmakers to put things together, not even
a pre-existing plan.... Only a principle of organization which is no organization
at all would seem to offer itself. In all of mathematics, nothing of this kind more
obviously offers itself than the principle that 'the boundary of boundary is zero.'
Moreover, all three great field theories of physics use this principle twice over....
This circumstance would seem to give us some reassurance that we are talking sense
when we think of... physics being"142 as foundation-free as a logic loop, the closed
circuit of ideas in a self-referential deductive axiomatic system.105,34,70,159
The universe as a machine? Is this universe one among a great ensemble of
machine universes, each differing from the others in the values of the dimensionless
constants of physics? Is our own selected from this ensemble by an anthropic princi-
ple of one or another form?7 We reject here the concept of universe not least because
it "has to postulate explicitly or implicitly, a supermachine, a scheme, a device, a
miracle, which will turn out universes in infinite variety and infinite number."156
Directly opposite to the concept of universe as machine built on law is the
vision of a world self-synthesized. In this view, the notes struck out on a piano by
the observer-participants of all places and all times, bits though they are, in and
by themselves constitute the great wide world of space and time and things.
THIRD NO
No continuum. No continuum in mathematics and therefore no continuum in
physics. A half-century of development in the sphere of mathematical logic151 has
made it clear that there is no evidence supporting the belief in the existential char-
acter of the number continuum. "Belief in this transcendental world," Hermann
Weyl tells us, "taxes the strength of our faith hardly less than the doctrines of
the early Fathers of the Church or of the scholastic philosophers of the Middle
Ages."122 This lesson out of mathematics applies with equal strength to physics.
"Just as the introduction of the irrational numbers ... is a convenient myth [which]
simplifies the laws of arithmetic ...so physical objects," Willard Van Orman Quine
tells us,92 "are postulated entities which round out and simplify our account of the
flux of existence .... The conceptual scheme of physical objects is a convenient myth,
simpler than the literal truth and yet containing that literal truth as a scattered
part."
Nothing so much distinguishes physics as conceived today from mathematics
as the difference between the continuum character of the one and the discrete char-
acter of the other. Nothing does so much to extinguish this gap as the elementary
quantum phenomenon "brought to a close," as Bohr puts it,19 by "an irreversible
(5]See for example the survey by S. Feferman, "Turing in the Land of 0(z)," pages 113-147, and
related papers on mathematical logic in R. Herken.56
act of amplification," such as the click of a photodetector or the blackening of a

grain of photographic emulsion. Irreversible? More than one idealized experiment149
illustrates how hard it is, even today, to give an all-inclusive definition of the term ir-
reversible. Those difficulties supply pressure, however, not to retreat to old ground,
but to advance to new insight. In brief, continuum-based physics, no; information-
based physics, yes.
FOURTH AND LAST NO

No space, no time. Heaven did not hand down the word "time." Man invented it,
perhaps positing hopefully as he did that "Time is nature's way to keep everything
from happening all at once." (61 If there are problems with the concept of time, they
are of our own creation. As Leibniz tells us,68 "... time and space are not things,
but orders of things ..."; or as Einstein put it,38 "Time and space are modes by
which we think, and not conditions in which we live."
What are we to say about that weld of space and time into spacetime which
Einstein gave us in his 1915 and still standard classical geometrodynamics? On
this geometry quantum theory, we know, imposes fluctuations.[71 Moreover, the
predicted fluctuations grow so great at distances of the order of the Planck length
that in that domain they put into question the connectivity of space and deprive the
very concepts of "before" and "after" of all meaning.M This circumstance reminds
us anew that no account of existence can ever hope to rate as fundamental which
does not translate all of continuum physics into the language of bits.
We will not feed time into any deep-reaching account of existence. We must
derive time—and time only in the continuum idealization—out of it. Likewise with
space.
4. FIVE CLUES
FIRST CLUE
The boundary of a boundary is zero. This central principle of algebraic topology,103
identity, triviality, tautology though it is, is also the unifying theme of Maxwell
electrodynamics, Einstein geometrodynamics, and almost every version of modern
field theory.191 That one can get so much from so little, almost everything from
almost nothing, inspires hope that we will someday complete the mathematization
of physics and derive everything from nothing, all law from no law.
161Discovered among the graffiti in the men's room of the Pecan Street Cafe, Austin, Texas.
11 see wheekr.224 0.25 and MTW,77 section 43.4.
[8)See Wheeler,132 page 411.
Msee MTW77, Chapter 15; Atiyah,6 cartan,23 ,24 and Kheyfets and Wheeler.64
SECOND CLUE
No question, no answer. Better put, no bit-level question, no bit-level answer. So
it is in the game of twenty questions in its surprise version [101 And so it is for
the electron circulating within the atom or a field within a space. To neither field
nor particle can we attribute a coordinate or momentum until a device operates to
measure the one or the other. Moreover, any apparatus that accuratelym measures
the one quantity inescapably rules out then and there the operation of equipment to
measure the other.17'18'55'121 In brief, the choice of question asked, and the choice
of when it's asked, play a part—not the whole part, but a part—in deciding what
we have the right to say.149,152
Bit-registration of a chosen property of the electron, a bit-registration of the
arrival of a photon, Aharonov-Bohm bit-based determination of the magnitude
of a field flux, bulk-based count of bits bound in a blackhole: all are examples
of physics expressed in the language of information. However, into a bit count
that one might have thought to be a private matter, the rest of the nearby world
irresistibly thrusts itself. Thus the atom-to-atom distance in a ruler—basis for a
bit count of distance—evidently has no invariant status, depending as it does on
the temperature and pressure of the environment. Likewise the shift of fringes in
the Aharonov-Bohm experiment depends not only upon the magnetic flux itself,
but also on the charge of the electron. But this electron charge—when we take
the quantum itself to be nature's fundamental measuring unit—is governed by the
square root of the quantity e2/hc = 1/137.036 ..., a "constant" which—for extreme
conditions—is as dependent on the local environment47 as is a dielectric "constant"
or the atom-to-atom spacing in the ruler.
The contribution of the environment becomes overwhelmingly evident when we
turn from length of bar or flux of field to the motion of alpha particle through cloud
chamber, dust particle through 3°K-background radiation or Moon through space.
This we know from the analyses of Bohr and Mott,79 Zeh,167'188 Joos and Zeh,61 ,
Zurek,17°,171372 and Unruh and Zurek.113 It from bit, yes; but the rest of the world
also makes a contribution, a contribution that suitable experimental design can
minimize but not eliminate. Unimportant nuisance? No. Evidence the whole show
is wired up together? Yes. Objection to the concept of every it from bits? No.
Build physics, with its false face of continuity, on bits on information! What
this enterprise is we perhaps see more clearly when we examine for a moment a
thoughtful, careful, wide-reaching exposition51 of the directly opposite thesis, that
physics at bottom is continuous; that the bit of information is not the basic en-
tity. Rate as false the claim that the bit of information is the basic entity. Instead,
attempt to build everything on the foundation of some "grand unified field the-
ory" such as string theory26,46 —or, in default of that, on Einstein's 1915 and still
standard geometrodynamics. Hope to derive that theory by way of one or another
plausible line of reasoning. But don't try to derive quantum theory. Treat it as
supplied free of charge from on high. Treat quantum theory as a magic sausage
[10)see Wheel rtisi pages 41-42, and Wheeler,132 pages 397-398.

supplied free of charge from on high. Treat quantum theory as a magic sausage
grinder which takes in as raw meat this theory, that theory, or the other theory,
and turns out a "wave equation," one solution of which is "the" wave function for
the universe.5°,51,54,115,126 From start to finish accept continuity as right and nat-
ural: continuity in the manifold, continuity in the wave equation, continuity in its
solution, continuity in the features that it predicts. Among conceivable solutions
of this wave equation select as reasonable one which "maximally decoheres," one
which exhibits "maximal classicity"—maximal classicity by reason, not of "some-
thing external to the framework of wave function and Schr8dinger equation," but
something in "the initial conditions of the universe specified within quantum theory
itself."
How do we compare the opposite outlooks of decoherence and it-from-bit?
Remove the casing that surrounds the workings of a giant computer. Examine the
bundles of wires that run here and there. What is the status of an individual wire?
The mathematical limit of the bundle? Or the building block of the bundle? The
one outlook regards the wave equation and wave function to be primordial and
precise and built on continuity, and the bit to be an idealization. The other outlook
regards the bit to be the primordial entity, and wave equation and wave function
to be secondary and approximate—and derived from bits via information theory.
Derived, yes; but how? No one has done more than William Wootters toward
opening up a pathway161'162 from information to quantum theory. He puts into
connection two findings, long known, but little known. Already before the ad-
vent of wave mechanics, he notes, the analyst of population statistics R. A. Fisher
proved4o,41 that the proper tool to distinguish one population from another is not
the probability of this gene, that gene, and the third gene (for example), but the
square roots of these probabilities; that is to say, the two probability amplitudes,
each probability amplitude being a vector with three components. More precisely,
Wooters proves, the distinguishability between the two populations is measured by
the angle in Hilbert space between the two state vectors, both real. Fisher, however,
was dealing with information that sits "out there." In microphysics, however, the
information does not sit out there. Instead, nature in the small confronts us with a
revolutionary pistol, "No question, no answer." Complementarity rules. And corn-
plementarity, as E. C. G. Stueckelberg proved197,1°8 as long ago as 1952, and as
Saxon made more readily understandable95 in 1964, demands that the probability
amplitudes of quantum physics must be complex. Thus Wootters derives famil-
iar Hilbert space with its familiar complex probability amplitudes from the twin
demands of complementarity and measure of distinguishability.
Should we try to go on from Wootters' finding to deduce the full blown machin-
ery of quantum field theory? Exactly not to try to do so—except as an idealization—
is the demand laid on us by the concept of it from bit. How come?
Probabilities exist "out there" no more than do space or time or the position
of the atomic electron. Probability, like time, is a concept invented by humans, and
humans have to bear the responsibility for the obscurities that attend it. Obscurities
there are whether we consider probability defined as frequency67 or defined a la
Bayes.60,94,97,114 Probability in the sense of frequency has no meaning as applied
to the spontaneous fission of the particular plutonium nucleus that triggered the
November 1, 1952 H-bomb blast.
What about probabilities of a Bayesian cast, probabilities "interpreted not
as frequencies observable through experiments, but as degrees of plausibility one
assigns to each hypothesis based on the data and on one's assessment of the plausi-
bility of the hypotheses prior to seeing the data"?31 Belief-dependent probabilities,
different probabilities assigned to the same proposition by different people?14 Proba-
bilities associated21 with the view that "objective reality is simply an interpretation
of data agreed to by large numbers of people?"
Heisenberg directs us to the experiences8 of the early nuclear-reaction-rate the-
orist Fritz Houtermans, imprisoned in Kharkov during the time of the Stalin ter-
ror: "... the whole cell would get together to produce an adequate confession ...
[and] helped [the prisoners] to compose their 'legends' and phrase them properly,
implicating as few others as possible."
Existence as confession? A myopic but in some ways illuminating formulation
of the demand for intercommunication implicit in the theme of it from bit!
So much for "No question, no answer."
THIRD CLUE
The super-Copernican principle.188 This principle rejects now-centeredness in any
account of existence as firmly as Copernicus repudiated here-centeredness. It re-
pudiates most of all any tacit adoption of now-centeredness in assessing observer-
participants and their number.
What is an observer-participant? One who operates an observing device and
participates in the making of meaning, meaning in the sense of Follesda1,42 "Mean-
ing is the joint product of all the evidence that is available to those who commu-
nicate." Evidence that is available? The investigator slices a rock and photographs
the evidence for the heavy nucleus that arrived in the cosmic radiation of a billion
years ago.149 Before he can communicate his findings, however, an asteroid atom-
izes his laboratory, his records, his rocks, and him. No contribution to meaning!
Or at least no contribution then. A forensic investigation of sufficient detail and
wit to reconstruct the evidence of the arrival of that nucleus is difficult to imagine.
What about the famous tree that fell in the forest with no one around?18 It leaves
a fallout of physical evidence so near at hand and so rich that a team of up-to-
date investigators can establish what happened beyond all doubt. Their findings
contribute to the establishment of meaning.
"Measurements and observations," it has been said,58 "cannot be fundamental
notions in a theory which seeks to discuss the early universe when neither existed."
On this view the past has a status beyond all questions of observer-participancy.
It from bit offers us a different vision: "reality is theory"Ill]; "the past has no
evidence except as it is recorded in the present."(121 The photon that we are going
1111See T. Segerstedt as quoted in VVheeler,132 page 415.
1121See Wheeler,131 page 41.
to register tonight from that four billion-year-old quasar cannot be said to have
had an existence "out there" three billion years ago, or two (when it passed an
intervening gravitational lens) or one, or even a day ago. Not until we have fixed
arrangements at our telescope do we register tonight's quantum as having passed
to the left (or right) of the lens or by both routes (as in a double-slit experiment).
This registration, like every delayed-choice experiment,75,131 reminds us that no
elementary quantum phenomenon is a phenomenon until, in Bohr's words,19 "It
has been brought to a close" by "an irreversible act of amplification." What we call
the past is built on bits.
Enough bits to structure a universe so rich in features as we know this world to
be? Preposterous! Mice and men and all on Earth who may ever come to rank as
intercommunicating meaning-establishing observer-participants will never mount a
bit count sufficient to bear so great a burden.
The count of bits needed, huge though it may be, nevertheless, so far as we
can judge, does not reach infinity. In default of a better estimate, we follow familiar
reasoning189 and translate into the language of bits the entropy of the primordial
cosmic fireball as deduced from the entropy of the present 2.735 deg K (uncertainty
< 0.05 deg K) microwave relict radiation" totaled over a 3-sphere of radius 13.2 x
109 light years (uncertainty > 35%)1131 or 1.25 x 1028 cm and of volume 272 radius3,
(number of bits) =(Iog2 e) x (number of nats)

=(log2 e) x (entropy/Boltzmann's constant, k)
=1.44... x [(874/45)(radius • kT/hc)3] (3)
=8 x 1088
It would be totally out of place to compare this overpowering number with the num-
ber of bits of information elicited up to date by observer-participancy. So warns the
super-Copernican principle. We today, to be sure, through our registering devices,
give a tangible meaning to the history of the photon that started on its way from
a distant quasar long before there was any observer-participancy anywhere. How-
ever, the far more numerous establishers of meaning of time to come have a like
inescapable part—by device-elicited question and registration of answer—in gener-
ating the "reality" of today. For this purpose, moreover, there are billions of years
yet to come, billions on billions of sites of observer-participancy yet to be occu-
pied. How far foot and ferry have carried meaning-making communication in fifty
thousand years gives faint feel for how far interstellar propagation is destined82,59
to carry it in fifty billion years.
Do bits needed balance bits achievable? They must, declares the concept of
"world as system self-synthesized by quantum networking."156 By no prediction
does this concept more clearly expose itself to destruction, in the sense of Popper.9°
(131See MTW,77 page 738, Box 27.4; or JGST,157 Chapt. 13, page 242.
FOURTH CLUE
"Consciousness." We have traveled what may seem a dizzying path. First, ele-
mentary quantum phenomenon brought to a close by an irreversible act of am-
plification. Second, the resulting information expressed in the form of bits. Third,
this information used by observer-participants—via communication—to establish
meaning. Fourth, from the past through the billeniums to come, so many observer-
participants, so many bits, so much exchange of information, as to build what we
call existence.
Doesn't this it-from-bit view of existence seek to elucidate the physical world,
about which we know something, in terms of an entity about which we know al-
most nothing, consciousness?22,33,43,44 And doesn't Marie Sklodowska Curie tell us,
"Physics deals with things, not people"? Using such and such equipment, making
such and such a measurement, I get such and such a number. Who I am has nothing
to do with this finding. Or does it? Am I sleepwalking?28," Or am I one of those poor
souls without the critical power to save himself from pathological science?57,100,66
Under such circumstances any claim to have "measured" something falls fiat until it
can be checked out with one's fellows. Checked how? Morton White reminds use"
how the community applies its tests of credibility, and in this connection quotes
analyses by Chauncey Wright, Josiah Royce, and Charles Saunders Peirce.1141 Par-
menides of Elea83 (^...-. 515 B.C.-450 B.C.) may tell us that "What is ... is identical
with the thought that recognizes it." We, however, steer clear of the issues con-
nected with "consciousness." The line between the unconscious and the conscious
begins to fade91 in our day as computers evolve and develop—as mathematics has—
level upon level upon level of logical structure. We may someday have to enlarge
the scope of what we mean by a "who." This granted, we continue to accept—as
an essential part of the concept of it from bit—Follesdal's guideline,42 "Meaning is
the joint product of all the evidence that is available to those who communicate."
What shall we say of a view of existence1151 that appears, if not anthropomorphic in
its use of the word "who," still overly centered on life and consciousness? It would
seem more reasonable to dismiss for the present the semantic overtones of "who"
and explore and exploit the insights to be won from the phrases, "communication"
and "communication employed to establish meaning."
Follesdal's statement supplies not an answer, but the doorway to new questions.
For example, man has not yet learned how to communicate with an ant. When he
1141 See Peirce,87 especially passages from pages 335-337, 353, and 358. Peirce's position on the
forces of nature, "May they not have naturally grown up," foreshadow though it does the concept
of the world as a self-synthesized system, differs from it in one decisive point, in that it tacitly
takes time as primordial category supplied free of charge from outside.
Irs)soo von schelling,rti especially volume 5, pages 428-430, as kindly summarized for me by
B. Kanitscheider: "class das Universum von vornherein ein ihm inunanentes Ziel, eine teleologis-
che Struktur, besitzt und in alien seinen Produkten auf evolutionire Stadien ausgerichtet ist, die
schliesslich die Hervorbringung von Selbst-bewusstsein einschliessen, welshes dann aber wiederum
den Entstehungsprozess reflektiert und diese Reflexion ist die notwendige Bedingung fiir die Kon-
stitution der Gegenstinde des Bewusstseins."
does, will the questions put to the world around by the ant and the answers that
he elicits contribute their share, too, to the establishment of meaning? As another
issue associated with communication, we have yet to learn how to draw the line
between a communication network that is closed, or parochial, and one that is
open. And how to use that difference to distinguish between reality and poker—or
another game116,118—so intense as to appear more real than reality. No term in
Follesdal's statement poses greater challenge to reflection than "communication,"
descriptor of a domain of investigation88,98,93 that enlarges in sophistication with
each passing year.
FIFTH AND FINAL CLUE
More is different.' Not by plan but by inner necessity, a sufficiently large number of
H2 O molecules collected in a box will manifest solid, liquid, and gas phases. Phase
changes, superfluidity, and superconductivity all bear witness to Anderson's pithy
point, more is different.
We do not have to turn to objects so material as electrons, atoms, and molecules
to see big numbers generating new features. The evolution from small to large has
already in a few decades forced on the computer a structure73,96 reminiscent of bi-
ology by reason of its segregation of different activities into distinct organs. Distinct
organs, too, the giant telecommunications system of today finds itself inescapably
evolving."'" Will we someday understand time and space and all the other fea-
tures that distinguish physics—and existence itself—as the similarly self-generated
organs of a self-synthesized information system?165,65
5. CONCLUSION
The spacetime continuum? Even continuum existence itself? Except as an idealiza-
tion neither the one entity nor the other can make any claim to be a primordial
category in the description of nature. It is wrong, moreover, to regard this or that
physical quantity as sitting "out there" with this or that numerical value in default
of the question asked and the answer obtained by way of an appropriate observing
device. The information thus solicited makes physics and comes in bits. The count
of bits drowned in the dark night of a blackhole displays itself as horizon area,
expressed in the language of the Bekenstein number. The bit count of the cosmos,
however it is figured, is ten raised to a very large power. So also is the number of
elementary acts of observer-participancy over any time of the order of fifty billion
years. And, except via those time-leaping quantum phenomena that we rate as el-
ementary acts of observer-participancy, no way has ever offered itself to construct
what we call "reality." That's why we take seriously the theme of it from bit.
6. AGENDA
Intimidating though the problem of existence continues to be, the theme of it from
bit breaks it down into six issues that invite exploration:
1. Go beyond Wootters and determine what, if anything, has to be added to dis-
tinguishability and complementarity to obtain all of standard quantum theory.
2. Translate the quantum versions of string theory and of Einstein's geometrody-
namics from the language of continuum to the language of bits.
3. Sharpen the concept of bit. Determine whether "an elementary quantum phe-
nomenon brought to a close by an irreversible act of amplication" has at bottom
(1) the 0-or-1 sharpness of definition of bit number nineteen in a string of binary
digits, or (2) the accordion property of a mathematical theorem, the length of
which, that is, the number of supplementary lemmas contained in which, the
analyst can stretch or shrink according to his convenience.
4. Survey one by one with an imaginative eye the powerful tools that mathematics
—including mathematical logic—has won and now offers to deal with theorems
on a wholesale rather than a retail level, and for each such technique work
out the transcription into the world of bits. Give special attention to one and
another deductive axiomatic system which is able to refer to itself,102 one and
another self-referential deductive system.
5. From the wheels-upon-wheels-upon-wheels evolution of computer programming
dig out, systematize, and display every feature that illuminates the level-upon-
level-upon-level structure of physics.
6. Capitalize on the findings and outlooks of information theory,25,30,111,166 algo-
rithmic entropy,174 evolution of organisms,3s,33,81 and pattern recogni-
tion.1,13,48,76,101,104,110,119 Search out every link that each has with physics
at the quantum level. Consider, for instance, the string of bits 1111111... and
its representation as the sum of the two strings 1001110... and 0110001 .... Ex-
plore and exploit the connection between this information-theoretic statement
and the findings of theory and experiment on the correlation between the polar-
izations of the two photons emitted in the annihilation of singlet positronium123
and in like Einstein-Podolsky-Rosen experiments.16 Seek out, moreover, every
realization in the realm of physics of the information-theoretic triangle inequal-
ity recently discovered by Zurek.173
Deplore? No, celebrate the absence of a clean clear definition of the term "bit"
as the elementary unit in the establishment of meaning. We reject "that view of
science which used to say, 'Define your terms before you proceed.' The truly creative
nature of any forward step in human knowledge," we know, "is such that theory,
concept, law, and method of measurement—forever inseparable—are born into the
world in union."109 If and when we learn how to combine bits in fantastically large
numbers to obtain what we call existence, we will know better what we mean both
by bit and by existence.
A single question animates this report: Can we ever expect to understand ex-
istence? Clues we have, and work to do, to make headway on that issue. Surely
someday, we can believe, we will grasp the central idea of it all as so simple, so
beautiful, so compelling that we will say to each other, "Oh, how could it have
been otherwise! How could we all have been so blind so long!"
ACKNOWLEDGMENTS
For discussion, advice, or judgment on one or another issue taken up in this review,
I am indebted to Nandar Balazs, John D. Barrow, Charles H. Bennett, David
Deutsch, Robert H. Dicke, Freeman Dyson, and the late Richard P. Feynman as
well as David Gross, James B. Hartle, John J. Hopfield, Paul C. Jeffries, Bernulf
Kanitscheider, Arkady Kheyfets, and Rolf W. Landauer; and to Warner A. Miller,
John R. Pierce, Willard Van Orman Quine, Benjamin Schumacher, and Frank J.
Tipler as well as William G. Unruh, Morton White, Eugene P. Wigner, William K.
Wootters, Hans Dieter Zeh, and Wojciech H. Zurek. For assistance in preparation
of this report I thank E. L. Bennett and NSF grant PHY245-6243 to Princeton
University. I give special thanks to the Santa Fe Institute and the organizers of the
May—June 1989 Conference on Complexity, Entropy, and the Physics of Information
at which the then-current version of the present analysis was reported.
This report evolved from presentations at the Santa Fe Institute conferences,
May 29—June 2 and June 4-8, 1989, and at the 3rd International Symposium on
Foundations of Quantum Mechanics in the Light of New Technology, Tokyo, Au-
gust 28-31, 1989, under the title "Information, Physics, Quantum: The Search
for Links"; and headed "Can We Ever Expect to Understand Existence?" as the
Penrose Lecture at the April 20-22, 1989, annual meeting of Benjamin Franklin's
"American Philosophical Society, Held at Philadelphia for Promoting Useful Knowl-
edge," and at the Accademia Nazionale dei Lincei Conference on La Verite. nella
Scienza, Rome, October 13, 1989; submitted to the proceedings of all four in ful-
fillment of obligation and in deep appreciation for hospitality.
REFERENCES
Three reference abbreviations: JGST=157, MTW=77, and WZ=148.
1. Agu, M. "Field Theory of Pattern Recognition." Phys. Rev. A 37 (1988):
4415-4418.
2. Aharonov, Y., and D. Bohm. "Significance of Electromagnetic Potentials in
the Quantum Theory." Phys. Rev. 115 (1959):485-491.
3. Anandan, J. "Comment on Geometric Phase for Classical Field Theories."
Phys. Rev. Lett. 60 (1988):2555.
4. Anadan, J., and Y. Aharonov. "Geometric Quantum Phase and Angles." Phys.
Rev. D 38 (1988):1863-1870. Includes references to the literature of the sub-
ject.
5. Anderson, P. W. "More is Different." Science 177 (1972):393-396.
6. Atiyah, M. Collected Papers, Vol. 5: Gauge Theories. Oxford: Clarendon,
1988.
7. Barrow, J. D., and F. J. Tipler. The Anthropic Cosmological Principle. New
York: Oxford Univ. Press, 1986. Also the literature therein cited.
8. Beck, F. [pseudonym of the early nuclear-reaction-rate theorist Fritz Router-
mans], and W. Godin. Translated from the German original by E. Mosbacher
and D. Porter. Russian Purge and the Extraction of Confessions. London:
Hurst and Blackett, 1951.
9. Bekenstein, J. D. "Black Holes and the Second Law." Nuovo Cimento Lett. 4
(1972):737-740.
10. Bekenstein, J. D. "Generalized Second Law of Thermodynamics in Black-Hole
Physics." Phys. Rev. D 8 (1973):3292-3300.
11. Bekenstein, J. D. "Black-Hole Thermodynamics." Physics Today 33 (1980):
24-31.
12. Bell, J. S. Collected Papers in Quantum Mechanics. Cambridge, UK: Cam-
bridge Univ. Press, 1987.
13. Bennett, B. M., D. D. Hoffman, and C. Prakash Observer Mechanics: A
Formal Theory of Perception. San Diego: Academic Press, 1989.
14. Berger, J. 0., and D. A. Berry. "Statistical Analysis and the Illusion of Ob-
jectivity." Am. Scientist 76 (1988):159-165.
15. Berkeley, G. Treatise Concerning the Principles of Understanding. Dublin,
1710; 2nd edition, 1734. Regarding his reasoning that "No object exists apart
from mind," cf. article on Berkeley by R. Adamson, Encyclopedia Brittanica,
Chicago 3 (1959), 438.
16. Bohm, D. "The Paradox of Einstein, Rosen and Podolsky." Originally pub-
lished in Quantum Theory, section 15-19, Chapter 22. Englewood Cliffs, NJ:
Prentice-Hall, 1950. Reprinted in WZ,148 pp. 356-368.
17. Bohr, N. "The Quantum Postulate and the Recent Development of Atomic
Theory." Nature 121 (1928):580-590.
18. Bohr, N., and L. Rosenfeld. "Zur Frage der Messbarkeit der elektromagnetis-
chen FeldgrEssen." Mat.-fys Medd. Dan. Vid. Selsk. 12(8) (1933). English
translation by Aage Petersen, 1979; reprinted in WZ,148 pp. 479-534.
19. Bohr, N. "Can Quantum-Mechanical Description of Physical Reality be Con-
sidered Complete?" Phys. Rev. 48 (1935):696-702. Reprinted in WZ,148 pp.
145-151.
20. Brink, L., and M. Henneaux. Principles of String Theory: Studies of the Cen-
tro de Estudios Cientificos de Santiago. New York: Plenum, 1988.
21. Burke, J. The Day the Universe Changed. Boston, MA: Little, Brown, 1985.
22. Calvin, W. H. The Cerebral Symphony. New York: Bantam, 1990.
23. Cartan, E. La Geometrie des Espaces de Riemann, Memorial des Sciences
Mathematiques. Paris: Gauthier-Villars, 1925.
24. Cartan, E. Lecons sur la Geometrie des Espaces de Riemann. Paris: Gauthier-
Villars, 1925.
25. Chaitin, G. J. Algorithmic Information Theory, revised 1987 edition. Cam-
bridge, UK: Cambridge Univ. Press, 1988.
26. Christodoulou, D. "Reversible and Irreversible Transformations in Black-Hole
Physics." Phys. Rev. Lett. 25 (1970):1596-1597.
27. Christodoulou, D., and R. Ruffini. "Reversible Transformations of a Charged
Black Hole." Phys. Rev. D 4 (1971):3552-3555.
28. Collins, W. W. The Moonstone. London, 1868.
29. Darwin, C. W. (1809-1882). On the Origin of Species by Means of Natural
Selection, or the Preservation of Favoured Races in the Struggle for Life. Lon-
don, 1859.
30. Delahaye, J.-P. "Chaitin's Equation: An Extension of GOdel's Theorem." No-
tices Amer. Math. Soc. 36 (1989):984-987.
31. Denning, P. J. "Bayesian Learning." Am. Scientist 77 (1989):216-218.
32. d'Espagnat, B. Reality and the Physicist: Knowledge, Duration and the Quan-
tum World. Cambridge, UK: Cambridge Univ. Press, 1989.
33. Edelman, G. M. Neural Darwinism. New York: Basic Books, 1987.
34. Ehresmann, C. Categories et Structures. Paris: Dunod, 1965.
35. Eigen, M., and R. Winkler. Das Spiel: Naturgesetze steuern den Zufall.
Munchen: Piper, 1975.
36. Einstein, A., to J. J. Laub, 1908, undated, Einstein Archives; scheduled for
publication in The Collected Papers of Albert Einstein, a group of volumes on
the Swiss years 1902-1914, Volume 5: Correspondence, 1902-1914, Princeton
University Press, Princeton, New Jersey.
37. Einstein, A. "Zur allgemeinen Relativititstheorie." Preuss. Akad. Wiss.
Berlin, Sitzber (1915), 799-801, 832-839, 844-847; (1916), 688-696; and
(1917), 142-152.
38. Einstein, A. As quoted by A. Forsee in Albert Einstein, Theoretical Physicist.
New York: Macmillan, 1963, 81.
39. Masser, W. M. Reflections on a Theory of Organisms. Frelighsburg, Quebec:
Orbis, 1987.
40. Fisher, R. A. "On the Dominance Ratio." Proc. Roy. Soc. Edin. 42 (1922):
321-341.
41. Fisher, R. A. Statistical Methods and Statistical Inference. New York: Hefner,
1956, 8-17.
42. Follesdal, D. "Meaning and Experience." In Mind and Language, edited by S.
Guttenplan. Oxford: Clarendon, 1975, 25-44.
43. Fuller, R. W., and P. Putnam. "On the Origin of Order in Behavior." General
Systems (Ann Arbor, MI) 12 (1966):111-121.
44. Fuller, R. W. "Causal and Moral Law: Their Relationship as Examined in
Terms of a Model of the Brain." Monday Evening Papers. Middletown, CT:
Wesleyan Univ. Press, 1967.
45. Green, M. B., J. H. Schwarz, and E. Witten. Superstring Theory. Cambridge,
UK: Cambridge Univ. Press, 1987.
46. Greenberger, D. M., ed. New Techniques and Ideas in Quantum Measurement
Theory. Annals of the New York Academy of Sciences, 1986, vol. 480.
47. Gross, D. J. "On the Calculation of the Fine-Structure Constant." Phys. To-
day 42(12) (1989).
48. Haken, H., ed. Pattern Formation by Dynamic Systems and Pattern Recogni-
tion. Berlin: Springer, 1979.
49. Haken, H. Information and Self-Organization: A Macroscopic Approach to
Complex Systems. Berlin: Springer, 1988.
50. Hartle, J. B., and S. W. Hawking. "Wave Function of the Universe." Phys.
Rev. D 28 (1983):2960-2975.
51. Hartle, J. B. "Progress in Quantum Cosmology." Preprint from the Physics
Department, University of California at Santa Barbara, 1989.
52. Hawking, S. W. "Particle Creation by Black Holes." Commun. Math. Phys.
43 (1975):199-220.
53. Hawking, S. W. "Black Holes and Thermodynamics." Phys. Rev. 13 (1976):
191-197.
54. Hawking, S. W. "The Boundary Conditions of the Universe." In Astrophysical
Cosmology, edited by H. A. Briick, G. V. Coyne,and M. S. Longair. Vatican
City: Pontificia Academia Scientiarum, 1982, 563-574.
55. Heisenberg, W. "Uber den anschaulichen Inhalt der quantentheoretischen
Kinematik and Mechanik." Zeits. f. Physik 43 (1927):172-198. English trans-
lation in WZ,148 pp. 62-84.
56. Herken, R., ed. The Universal Turing Machine: A Half-Century Survey. Ham-
burg: Kammerer & Unverzagt and New York: Oxford Univ. Press, 1988.
57. Hetherington, N. S. Science and Objectivity: Episodes in the History of As-
tronomy. Ames, IA: Iowa State Univ. Press, 1988.
58. Hobson, J. Allan. Sleep. Scientific American Library. New York: Freeman,
1989, 86, 89, 175, 185, 186.
59. Jastrow, R. Journey to the Stars: Space Exploration-Tomorrow and Beyond.
New York: Bantam, 1989.
60. Jaynes, E. T. "Bayesian Methods: General Background." In Maximum En-

tropy and Bayesian Methods in Applied Statistics, edited by J. H. Justice.
Cambridge, UK: Cambridge Univ. Press, 1986, 1-25.
61. Joos, E., and H. D. Zeh. "The Emergence of Classical Properties through
Interaction with the Environment." Zeits. f. Physik B59 (1985):223-243.
62. Kepler, J. (1571-1630). Harmonices Mundi. 5 books, 1619.
63. Kepler, J. Utriusque Cosmo Maioris scilicet et Minoris Metaphysica, Physica
ague technics Historia, 1st edition. Oppenheim, 1621.
64. Kheyfets, A., and J. A. Wheeler. "Boundary of a Boundary Principle and
Geometric Structure of Field Theories." Intl. J. Theor. Phys. 25 (1986):573-
580.
65. Kohonen, T. Self-Organization and Associative Memory, 3rd edition. New
York: Springer, 1989.
66. Langmuir, I. "Pathological Science." 1953 colloquium, transcribed and edited.
Phys. Today 42(12) (1989):36-48.
67. Larson, H. J. Introduction to Probability Theory and Statistical Inference, 2nd
edition. New York: Wiley, 1974.
68. Leibniz, G. W. Animadversiones ad Joh. George Wachteri librum de recondite
Hebraeorum philosophia. c. 1708, unpublished. English translation in P. P.
Wiener, Leibniz Selections, Scribners, New York (1951), 488.
69. Leibniz, G. W. As cited in J. R. Newman, The World of Mathematics. New
York: Simon and Schuster, 1956.
70. Lohmer, D. Phinomenologie der Mathematik: Elemente einer Phinomeno-
logischen Aufklirung der Mathematischen Erkenntnis nach Husserl. Norwell,
MA: Kluwer, 1989.
71. Mann, T. Freud, Goethe, Wagner. New York, 1937, 20. Translated by H. T.
Lowe-Porter from Freud and die Zukunft, Vienna (1936).
72. Mather, J., et al. "A Preliminary Measurement of the Cosmic Microwave
Background Spectrum by the Cosmic Background Explorer (COBE) Satel-
lite." Submitted for publication to Astrophys. J. Lett., 1990.
73. Mead, C., and L. Conway. Introduction to VLSI [very large-scale integrated-
circuit design] Systems. Reading, MA: Addison-Wesley, 1980.
74. Mendel, J. G. "Versuche iiber Pflanzenhybriden." Verhandlungen des Natur-
forschenden Vervins in Brinn 4. 1866.
75. Miller, W. A., and J. A. Wheeler. "Delayed-Choice Experiments and Bohr's
Elementary Quantum Phenomenon." In Proceedings of International Sympo-
sium on Foundations of Quantum Mechanics in the Light of New Technology,
Tokyo, 1983, edited by S. Kamefuchi et al. Tokyo: Physical Society of Japan,
1984, 140-151.
76. Minsky, M., and S. Papert. Perceptrons: An Introduction to Computational
Geometry, 2nd edition. Cambridge, MA: MIT Press, 1988.
77. Misner, C. W., K. S. Thorne, and J. A. Wheeler. Gravitation. San Francisco
(now New York): Freeman, 1973. Cited hereafter as MTW; see the paragraph
on participatory concept of the universe, p. 1217.
78. Mittelstaedt, P., and E. W. Stachow, eds. Recent Developments in Quantum

Logic. Zurich: Bibliographisches Institut, 1985.
79. Mott, N. F. "The Wave Mechanics of a-Ray Tracks." Proc. Roy. Soc. London
A126 (1929):74-84. Reprinted in WZ,148 pp. 129-134.
80. Newton, I. Philosophiae naturalis principia mathematica, 1st edition. London,
1687.
81. Nicols, G., and I. Prigogine. Exploring Complexity: An Introduction. New
York: Freeman, 1989.
82. O'Neill, G. K. The High Frontier, 4th edition. Princeton, NJ: Space Studies,
1989.
83. Parmenides of Elea [c. 515 B.C.-450 B.C.]. Poem "Nature," Part "Truth."
As summarized by A. C. Lloyd in the article on Parmenides, Encyclopedia
Brittanica, Chicago 17 (1959), 327.
84. Patton, C. M., and J. A. Wheeler. "Is Physics Legislated by Cosmogony?" In
Quantum Gravity, edited by C. Isham, R. Penrose, and D. Sciama. Oxford:
Clarendon, 1975, 538-605. Reprinted in part in R. Duncan and M. Weston-
Smith, eds., Encyclopaedia of Ignorance, Pergamon, Oxford (1977), 19-35.
85. Pauli, W. "Der Einfluss archetypischer Vorstellungen auf die Bildung na-
truwissenschaftlicher Theorien bei Kepler." In Naturerldirung and Psyche.
Zurich: Rascher, 1952, 109-194. Reprinted in R. Kronig and V. F. Weis-
skopf, eds., Wolfgang Pauli: Collected Scientific Papers, Interscience-Wiley,
New York, 1964, vol. 1, 1023.
86. Penrose, R. "Gravitational Collapse: The Role of General Relativity." Riv.
Nuovo Cimento I (1969):252-276.
87. Peirce, C. S. The Philosophy of Peirce: Selected Writings, edited by J. Buch-
ler. London: Routledge and Kegan Paul, 1940. Selected passages reprinted in
Patton and Wheeler", pp. 593-595.
88. Pierce, J. R. Symbols, Signals and Noise: The Nature and Process of Commu-
nication. New York: Harper and Brothers, 1961.
89. Planck, M. "Zur des Gesetzes der Energieverteilung im Normalspektrum."
Verhand. Deutschen Phys. Gesell. 2 (1900):237-245.
90. Popper, K. Conjectures and Refutations: The Growth of Scientific Knowledge.
New York: Basic Books, 1962.
91. Pugh, G. E. On the Origin of Human Values. New York, 1976. See the chapter
"Human Values, Free Will, and the Conscious Mind," reprinted in Zygon 11
(1976):2-24.
92. Quine, W. V. 0. In the essay "On What There Is." In From a Logical Point
of View, 2nd edition. Cambridge, MA: Harvard Univ. Press, 1980, 18.
93. Roden, M. S. Digital Communication Systems Design. Englewood Cliffs, NJ:
Prentice Hall, 1988.
94. Rosenkrantz, R. D., ed. E. T. Jaynes: Papers on Probability, Statistics and
Statistical Physics. Hingham, MA: Reidel-Kluwer, 1989.
95. Saxon, D. S. Elementary Quantum Mechanics. San Francisco: Holden, 1964.
96. Schneck, P. B. Supercomputer Architecture. Norwell, MA: Kluwer, 1987.
97. Schrodinger, E. "The Foundation of the Theory of Probability." Proc. Roy.

Irish Acad. 51 A (1947):51-66 and 141-146.
98. Schwartz, M. Telecommunication Networks: Protocols, Modeling and Analysis.
Reading, MA: Addison-Wesley, 1987.
99. Shakespeare, W. The Tempest, Act IV, Scene I, lines 148 ff.
100. Sheehan, W. Planets and Perception: Telescopic Views and Interpretations.
Tucson, AZ: Univ. of Arizona Press, 1988.
101. Small, H., and E. Garfield. "The Geography of Science: Disciplinary and Na-
tional Mappings." J. Info. Sci. 11 (1985):147-159.
102. Smorynski, C. Self-Reference and Model Logic. Berlin: Springer, 1985.
103. Spanier, E. H. Algebraic Topology. New York: McGraw-Hill, 1966.
104. Steen, L. A. "The Science of Patterns." Science 240 (1988):611-616.
105. Steenrod, N. E. Cohomology Operations. Princeton, NJ: Princeton Univ. Press,
1962.
106. Streete, Thomas (1622-1689). Astronomia Carolina: A New Theorie of the
Celestial Motions. London, 1661.
107. Stueckelberg, E. C. G. "Theoreme H et unitarite de S." Hely. Phys. Acta 25
(1952):577-580.
108. Stueckelberg, E. C. G. "Quantum Theory in Real Hilbert Space." Helv. Phys.
Acta 33 (1960):727-752.
109. Taylor, E. F., and J. A. Wheeler. Spacetime Physics. San Francisco: Freeman,
1963,102.
110. Tou, J., and R. C. Gonzalez. Pattern Recognition Principles. Reading, MA:
Addison-Wesley, 1974.
111. Traub, J. F., G. W. Wasilkowski, and H. Woznaikowski. Information-Based
Complexity. San Diego: Academic Press, 1988.
112. Tukey, J. W. "Sequential Conversion of Continuous Data to Digital Data."
Bell Laboratories memorandum of Sept. 1, 1947. Marks the introduction of
the term "bit." Reprinted in H. S. Tropp, "Origin of the Term Bit," Annals
Hist. Computing 6 (1984):152-155.
113. Unruh, W. G., and W. H. Zurek. "Reduction of a Wave Packet in Quantum
Brownian Motion." Phys. Rev. .D 40 (1989):1071-1094.
114. Viertl, R., ed. Probability and Bayesian Statistics. Singapore: World Scientific,
1987.
115. Vilenkin, A. "Creation of Universes from Nothing." Phys. Lett. F 117 (1982):
25-28.
116. von Neumann, J., and 0. Morgenstern. Theory of Games and Economic Be-
havior. Princeton, NJ: Princeton Univ. Press, 1944.
117. von Schelling, F. W. J. [1775-1854]. In Schellings Werke, nach der Origi-
nalausgabe in neuer Anordnung herausgegben, edited by M. Schroter.
Munchen: Beck, 1958-1959.
118. Wang, J. Theory of Games. New York: Oxford Univ. Press, 1988.
119. Watanabe, S., ed. Methodologies of Pattern Recognition. New York: Academic
Press, 1967.
120. Watson, J. D., and F. H. C. Crick. "Molecular Structure of Nucleic Acids: A

Structure for Deoxyribose Nucleic Acid." Nature 171 (1953):737-738.
121. Weyl, H. Gruppentheorie and Quantenmechanik. Leipzig: Hirzel, 1928.
122. Weyl, H. "Mathematics and Logic." A brief survey serving as a preface to
a review of The Philosophy of Bertrand Russell. Amer. Math. Monthly 53
(1946):2-13.
123. Wheeler, J. A. "Polyelectrons." Ann. NY Acad. Sci. 46 (1946):219-238.
124. Wheeler, J. A. "Assessment of Everett's 'Relative State' Formulation of Quan-
tum Theory." Rev. Mod. Phys. 29 (1957):463-65.
125. Wheeler, J. A. "On the Nature of Quantum Geometrodynamics." Ann. of
Phys. 2 (1957):604-614.
126. Wheeler, J. A. "Superspace and the Nature of Quantum Geometrodynamics."
In Battelle Rencontres: 1967 Lectures in Mathematics and Physices, edited
by C. M. DeWitt and J. A. Wheeler. New York: Benjamin, 1968, 242-307.
Reprinted as "Le Superespace et la Nature de la Geometrodynamique Quan-
tique," in Fluides et Champ Gravitationnel en Relativiti Generale, No. 170,
Colloques Internationaux, Editions du Centre National de la Recherche Sci-
entifique, Paris (1969).
127. Wheeler, J. A. "Transcending the Law of Conservation of Leptons." In Atti
del Convegno Internazionale sul Tema: The Astrophysical Aspects of the Weak
Interactions (Cortona "Il Palazzone," 10-12 Guigno 1970). Quaderno N. 157:
Accademia Nationale dei Lincei, 1971, 133-164.
128. Wheeler, J. A. "The Universe as Home for Man." In The Nature of Scien-
tific Discovery, edited by 0. Gingerich. Washington: Smithsonian Institution
Press, 1975, 261-296. Preprinted in part in American Scientist 62 (1974):
683-691. Reprinted in part in T. P. Snow, The Dynamic Universe, West, St.
Paul Minnesota (1983), 108-109.
129. Wheeler, J. A. "Include the Observer in the Wave Function?" Fundamenta
Scientiae: Seminaire sur les Fondements des Sciences (Strasbourg) 25 (1976):
9-35. Reprinted in J. Leite Lopes and M. Paty, eds., Quantum Mechanics, A
Half Century Later, Reidel, Dordrecht (1977), 1-18.
130. Wheeler, J. A. "Genesis and Observership." In Foundational Problems in the
Special Sciences, edited by R. Butts and J. Hintikka. Dordrecht: Reidel, 1977,
1-33.
131. Wheeler, J. A. "The 'Past' and the `Delayed-Choice' Double-Slit Experiment."
In Mathematical Foundations of Quantum Theory, edited by A. R. Marlow.
New York: Academic Press, 1978, 9-48. Reprinted in part in WZ,148 pp. 182-
200.
132. Wheeler, J. A. "Frontiers of Time." Problems in the Foundations of Physics,
Proceedings of the International School of Physics "Enrico Fermi" (Course
72), edited by N. Toraldo di Francia. Amsterdam: North Holland, 1979, 395-
497. Reprinted in part in WZ,148 pp. 200-208.
133. Wheeler, J. A. "The Quantum and the Universe." In Relativity, Quanta, and
Cosmology in the Development of the Scientific Thought of Albert Einstein,
edited by M. Pantaleo and F. deFinis. New York: Johnson Reprint Corp.,

1979, vol. II, 807-825.
134. Wheeler, J. A. "Beyond the Black Hole." In Some Strangeness of the Pro-
portion: A Centennial Symposium to Celebrate the Achievements of Albert
Einstein, edited by H. Woolf. Reading, MA: Addison-Wesley, 1980, 341-375.
Reprinted in part in WZ,148 pp. 208-210.
135. Wheeler, J. A. "Pregeometry: Motivations and Prospects." In Quantum The-
ory and Gravitation, proceedings of a symposium held at Loyola University,
New Orleans, May 23-26, 1979, edited by A. R. Marlow. New York: Academic
Press, 1980, 1-11.
136. Wheeler, J. A. "Law without Law." In Structure in Science and Art, edited
by P. Medawar and J. Shelley. New York: Elsevier North-Holland and Ams-
terdam: Excerpta Medica, 1980.
137. Wheeler, J. A. "Delayed-Choice Experiments and the Bohr-Einstein Dialog."
In American Philosophical Society and the Royal Society: Papers Read at a
Meeting, June 5, 1980. Philadelphia: Am. Philosophical Society, 1980, 9-40.
Reprinted in slightly abbreviated form and translated into German as "Die
Experimente der verzOgerten Entscheidung and der Dialog zwischen Bohr and
Einstein," in B. Kanitschedier, ed., Moderne Naturphilosophie, KOnigshausen
and Neumann, Wiirzburg (1984), 203-222. Reprinted in A. N. Mitra, L. S.
Kothari, V. Singh, and S. K. Trehan, eds., Niels Bohr: A Profile, Indian Na-
tional Science Academy, New Delhi (1985), 139-168.
138. Wheeler, J. A. "Not Consciousness but the Distinction between the Probe
and the Probed as Central to the Elemental Quantum Act of Observation."
In The Role of Consciousness in the Physical World, edited by R. G. Jahn.
Boulder: Westview, 1981, 87-111.
139. Wheeler, J. A. "The Elementary Quantum Act as Higgledy-Piggledy Build-
ing Mechanism." In Quantum Theory and the Structures of Time and Space,
Papers presented at a Conference held in Tutzing, July, 1980, edited by L.
Castell and C. F. von Weizsacker. Munich: Carl Hanser, 1981, 27-30.
140. Wheeler, J. A. "The Computer and the Universe." Int'l. J. Theor. Phys. 21
(1982):557-571.
141. Wheeler, J. A. "Bohr, Einstein, and the Strange Lesson of the Quantum."
In Mind in Nature, Nobel Conference XVII, Gustavus Adolphus College, St.
Peter, Minnesota, edited by Richard Q. Elvee. New York: Harper and Row,
1982, 1-30, 88, 112-113, 130-131, and 148-149.
142. Wheeler, J. A. "Particles and Geometry." In Unified Theories of Elementary
Particles, edited by P. Breitenlohner and H. P. Diirr. Berlin: Springer, 1982,
189-217.
143. Wheeler, J. A. "Blackholes and New Physics." In Discovery: Research and
Scholarship at the University of Texas at Austin 7(2) (Winter 1982):4-7.
144. Wheeler, J. A. Physics and Austerity (in Chinese). Anhui, China: Anhui Sci-
ence and Technology Publications, 1982. Reprinted in part in Krisis, edited
by I. Marculescu, vol. 1, no. 2, Lecture II, Klinckscieck, Paris (1983), 671-675.
145. Wheeler, J. A. "On Recognizing Law without Law." Am. J. Phys. 51 (1983):
398-404.
146. Wheeler, J. A. "Jenseits aller Zeitlichkeit." In Die Zeit, Schriften der Carl
Friedrich von Siemens-Stiftung, edited by A. Peisl and A. Mohler. Miinchen:
Oldenbourg, 1983, vol. 6, 17-34.
147. Wheeler, J. A. "Elementary Quantum Phenomenon as Building Unit." In
Quantum Optics, Experimental Gravitation, and Measurement Theory, edited
by P. Meystre and M. Scully. New York and London: Plenum, 1983, 141-143.
148. Wheeler, J. A., and W. H. Zurek. Quantum Theory and Measurement. Prince-
ton: Princeton Univ. Press, 1983.
149. Wheeler, J. A. "Bits, Quanta, Meaning." In Problems in Theoretical Physics,
edited by A. Giovannini, F. Mancini, and M. Marinaro. Salerno: Univ. of
Salerno Press, 1984, 121-141. Also in Theoretical Physics Meeting: Atti del
Convegno, Amalfi, 6-7 maggio 1983, Edizioni Scientifiche Italiane, Naples
(1984), 121-134. Also in A. Giovannini, F. Mancini, M. Marinaro, and A.
Rimini, Festschrift in Honour of Eduardo R. Caianiello, World Scientific, Sin-
gapore (1989).
150. Wheeler, J. A. "Quantum Gravity: The Question of Measurement." In Quan-
tum Theory of Gravity, edited by S. M. Christensen. Bristol: Hilger, 1984,
224-233.
151. Wheeler, J. A. "Bohr's 'Phenomenon' and 'Law without Law." In Chaotic
Behavior in Quantum Systems, edited by G. Casati. New York: Plenum, 1985,
363-378.
152. Wheeler, J. A. "Physics as Meaning': Three Problems." In Frontiers of Non-
Equilibrium Statistical Physics, edited by G. T. Moore and M. 0. Scully. New
York: Plenum, 1986, 25-32.
153. Wheeler, J. A. "Interview on the Role of the Observer in Quantum Mechan-
ics." In The Ghost in the Atom, edited by P. C. W. Davies and J. R. Brown.
Cambridge: Cambridge Univ. Press, 1986, 58-69.
154. Wheeler, J. A. "How Come the Quantum." In New Techniques and Ideas in
Quantum Measurement Theory, edited by D. M. Greenberger. Ann. New York
Acad. Sci. 480 (1987):304-316.
155. Wheeler, J. A. "Hermann Weyl and the Unity of Knowledge." In Exact Sci-
ences and Their Philosophical Foundations, edited by W. Deppert et al. Frank-
furt am Main: Lang, 1988, 469-503. Appeared in abbreviated form in Am.
Scientist 74 (1986):366-375.
156. Wheeler, J. A. "World as System Self-Synthesized by Quantum Networking."
IBM J. Res. 6 Dev. 32 (1988):4-25. Reprinted in E. Agazzi, ed., Probability
in the Sciences, Kluwer, Amsterdam (1988), 103-129.
157. Wheeler, J. A. A Journey into Gravity and Spacetime. Scientific American
Library. New York: Freeman, 1990.
158. White, M. Science and Sentiment in America: Philosophical Thought from
Jonathan Edwards to John Dewey. New York: Oxford Univ. Press, 1972.
159. Weil, A. "De la Metaphysique aux mathematiques." Sciences, 52-56. Re-

printed in A. Weil, Ouevres Scientifiques: Collected Works, Vol. 2, 1951-6.4,
Springer, New York (1979), 408-412.
160. Wootters, W. K., and W. H. Zurek. "Complementarity in the Double-Slit Ex-
periment: Quantum Nonseparability and a Quantitative Statement of Bohr's
Principle." Phys. Rev. D 19 (1979):473-484.
161. Wootters, W. K. "The Acquisition of Information from Quantum Measure-
ments." Ph.D. dissertation, University of Texas at Austin, 1980.
162. Wootters, W. K. "Statistical Distribution and Hilbert Space." Phys. Rev. 23
(1981):357-362.
163. Wootters, W. K., and W. H. Zurek. "A Single Quantum Cannot Be Cloned."
Nature 279 (1982):802-803.
164. Wootters, W. K., and W. H. Zurek. "On Replicating Photons." Nature 304
(1983):188-189.
165. Yates, F. E., ed. Self-Organizing Systems: The Emergence of Order. New York:
Plenum, 1987.
166. Young, P. The Nature of Information. Westport, CT: Praeger-Greenwood,
1987.
167. Zeh, H. D. "On the Interpretation of Measurement in Quantum Theory."
Found. Phy. I (1970):69-76.
168. Zeh, H. D. The Physical Basis of the Direction of Time. Berlin: Springer,
1989.
169. Zel'dovich, Ya. B., and I. D. Novikov. Relativistic Astrophysics, Vol. I: Stars
and Relativity. Chicago: Univ. of Chicago Press, 1971.
170. Zurek, W. H. "Pointer Basis of Quantum Apparatus: Into What Mixture Does
the Wavepacket Collapse?" Phys. Rev. D 24 (1981):1516-1525.
171. Zurek, W. H. "Environment-Induced Superselection Rules." Phys. Rev. D 26
(1982):1862-1880.
172. Zurek, W. H. "Information Transfer in Quantum Measurements: Irreversibil-
ity and Amplification." In Quantum Optics, Experimental Gravitation and
Measurement Theory, edited by P. Meystre and M. 0. Scully. NATO ASI
Series. New York: Plenum, 1983,87-116.
173. Zurek, W. H., and K. S. Thorne. "Statistical Mechanical Origin of the Entropy
of a Rotating, Charged Black Hole." Phys. Rev. Lett. 20 (1985):2171-2175.
174. Zurek, W. H. "Algorithmic Randomness and Physical Entropy." Phys. Rev.
A 40 (1989):4731-4751.
175. Zurek, W. H. "Thermodynamic Cost of Computation: Algorithmic Complex-
ity and the Information Metric." Nature 34 (1989):119-124.
Benjamin Schumacher
Kenyon College, Gambier, OH 43022
Information from Quantum Measurements
A theorem due to A. S. Kholevo connects the entropy of a quantum system

and the amount of accessible information in the system. In this paper I
discuss this theorem and a few of its consequences.
QUANTUM COMMUNICATION AND ENTROPY

Information theory was originally developed by C. E. Shannon as a mathematical
theory of communication.9 Although the theory has since found wider application,
many of its most powerful and significant results (e.g., the coding theorems) arise
in the context of communication theory. It, thus, seems reasonable at a workshop
on the "physics of information" to give some thought to the "physics of communi-
cation."
Communication is the delivery of information from a sender to a receiver by
means of a physical system called a communication channel. The sender prepares
the channel in a state, or signal, which corresponds to his particular message accord-
ing to an agreed-upon correspondence rule known as a code. The receiver examines
the channel to determine the sender's signal; then, knowing the code, he infers the
message. This process is a very familiar one—in fact, the reader is even now en-
gaging in it. This page is a communication channel and the arrangement of ink
Complexity, Entropy, and the Physics of Information, SR Studies in the

30 Benjamin Schumacher
upon its surface is a signal. According to the code known as "written English" this
signal corresponds to a message (which includes, for instance, a description of the
communication process). This general picture of communication includes both the
notion of information transfer and the notion of information storage and retrieval.
Information theory as formulated by Shannon takes an essentially statistical
approach to this process. A particular message xi is chosen with probability p(xi )
from an abstract set X of possible messages. The information content of the message
is given by the information function
H(X) = — E p(xi) logp(xi) . (1)
(All logarithms have base 2, so that H(X) is in "bits.") H(X) can be viewed as
a measure of the receiver's uncertainty about X before the signal is transmitted.
After the transmission, the receiver has examined the channel with result yk (from
a set Y of possible results) and ascribes a conditional probability p(xilyk) to each
possible message. If the channel is "noisy," the receiver may still have a non-zero
degree of uncertainty about the message X—on average, an amount
H(X IY)=Ep(y k )H(X I yk)
= H(X,Y) — H(Y), (2)
where H(X,Y) and H(Y) are defined by the joint distribution for X and Y and the
marginal distribution for Y, respectively. Thus, the receiver has gained an amount
of information
H(X : Y) = H(X) — H(X I Y)

H(X)+ H(Y)— H(X,Y) (3)
in the communication process. H(X : Y) is usually called the mutual information

and measures the real effectiveness of a communication scheme.
Suppose now that the channel is a quantum system Q described by a Hilbert
space 1-42. The signal corresponding to the message xi is a state of Q represented
by a density operator p(xi). The signal state might be a pure state, with p(xi) =
I xi)(zi I for some state vector I xi) in lig, but this need not be the case. Between
the sender and the receiver, Q may undergo a dynamical evolution according to
some Hamiltonian H, which may include both internal dynamics and interaction
between Q and the environment. The signal propogates, undergoes distortion, and
perhaps acquires "static." For simplicity, I will ignore the dynamics of the signal;
more precisely, I imagine that the effect of dynamics is already included in the
signal states p(zi).
The receiver tries to infer the message from the outcome of a measurement
of a "decoding observable" A on Q. For a given message xi, the outcome a of
the A-measurement has a probability Tr74p(xi ), so that the joint distribution is
Information from Quantum Measurements 31
p(xi, a) = p(xi), Tr7rap(xi). From this distribution the mutual information H(X :
A) can be calculated using Eq. 3.
The ensemble of possible messages gives rise to an ensemble of possible signals.
This ensemble is described by the density operator
p= E p(xi) p(xj) . (4)
p correctly predicts the ensemble average of any quantum observable. For example,
the average signal energy is (E) = TrpH . The entropy of the signal ensemble,
defined by
S[p] = — Trp log p, (5)
is a quantity with obvious analogies to the information function H(X), which is
in fact frequently called the "entropy." However, the two are quite different. In-
formation is a semantic quantity, a function of the abstract ensemble of possible
messages. Entropy is a physical quantity with a thermodynamic meaning. The re-
lation between the two is a key issue in the physics of information.
A particularly deep insight into this question is provided by a theorem of
A. S. Kholevo5 which sets a bound on H(X : A), the amount of information con-
veyed by the quantum channel Q. Kholevo showed that
H(X : A) < S[p] — E p(xi)S[p(xj)] , (6)
with equality only if the signal states p(xi) all commute with one another. Since the
subtracted term on the right is non-negative, it trivially follows that
H(X : A) < S[p]. That is, a quantum channel Q can deliver an amount of in-
formation no greater than the entropy of the ensemble of signals.
I should remark that the model of measurement used by Kholevo in the proof
of this theorem is a very general one. He assumes that the decoding observable A
is a positive-operator-valued (POV) measure; that is, each measurement outcome
a is associated with a positive operator 7r. in 11Q for which
E 7r. = 1 . (7)
The probabilities for the various measurement outcomes are given by the usual
quantum trace rule. For an ordinary measurement, the ire's are projections—that is,
ordinary measurements are projection-valued (PV) measures. The POV measures
clearly include the PV measures as a subset 3
CHANNEL CAPACITY
One consequence of Kholevo's theorem is that simple quantum channels cannot
hold an unlimited amount of information. Suppose that dim HQ = N. It is al-
ways possible to increase H(X) by increasing the number of possible messages and
signals. Further, since we might allow POV measures in our class of observable
quantities, there is no limit to the number of measurement outcomes and hence no
limit to H(A). In other words, the sender can attempt to put as much information
as he wishes into the channel, and the receiver can attempt to acquire as much
information as he wishes from the channel. However, the entropy of the signal en-
semble is bounded by S[p] < log N. Therefore, by Kholevo's theorem no possible
coding-decoding scheme can use the channel Q to convey a quantity information
H(X : A) greater than log N. A spin-1/2 system, for example, has an information
capacity of just one bit.
This is intuitively satisfying, since we sometimes think of a spin-1/2 system as
a "two-state" system. But in fact there are an infinite number of states of a spin-
1/2 system, one for each point on the Stokes sphere (pure states) or in its interior
(mixed states). An unlimited amount of information can be coded in the spin state.
Nevertheless, the quantum state of the spin is not an observable, and the accessible
information can be no larger than a single bit.
On the other hand, since the receiver can choose the decoding observable, he has
a choice about which part of the coded information to access. This can be illustrated
by Wiesner's quantum multiplexing.10 Imagine that Q is a spin-1/2 system, and let
+) and I —) be the eigenstates of a,. The idea is to code two distinct one-bit
messages X and Y into the channel Q. Four possible joint messages (XY = 00, 01,
11, or 10) are coded in the following four signal states:
I 00) = cose I +) + sine I —)

I 01) = case +) — sine I —)
1 11) = sine +) + cose —)
1 10) = sine I +) — cose 1 — ) (8)
where 8 = 7/8. If each message has probability 1/4, then the message information
H(XY) is two bits.
No observable can read both bits, but it is possible to read something of either
bit by a suitable choice of measurement. If the receiver measures oz , for example,
he can read the first bit X with an error probability of about 15%, though he learns
nothing about the second bit. That is, H(X : .4 and H(Y : oz) = 0. Similarly,
a measurement of oy yields .4 bits of information about Y but no information about
X. In each case less than one bit is received, but this deficiency can be overcome in
a long sequence of messages by the use of redundancy and error-correcting codes.
Two distinct messages can thus be coded into complementary observables az and
cr; the receiver can read either one, but not both.
Notice that even the sum of the mutual informations in this example is less
than one bit. This is not accidental and is an expression of the complementarity of
the decoding observables. Maassen and Uffink7 have showed that, for any complete
ordinary (PV measure) observables A and B and any state p,
H(AIp) + H(B1p)> C
= —log (sup I (ai I bj)12) , (9)
where I ai) and I b,) are eigenstates of A and B, respectively. Eq. 9 amounts to an
information-theoretic uncertainty relation for A and B, and is the strongest such
inequality yet derived for finite-state quantum systems. If dim HQ = N, then for
any message X coded into Q,
H (X : A) + H (X : B) = H(A) + H(B) — [H(A I X) + H(B I X)]
< 2 log N — C . (10)
For az and as on a spin-1/2 system Q, C = log 2 and so the sum H(XY : az )+

H(XY : as ) < 1 bit, as we noted.
CORRELATION AND EVERETT'S CONJECTURE

Information theory was used by Hugh Everett III in the analysis of his "relative
state" or "many-worlds" interpretation of quantum mechanics.4 In the course of this
work Everett made an interesting conjecture which, armed with Kholevo's theorem,
we are now in a position to prove.
Consider a quantum system Q composed of two subsystems Q1 and Q2. If
the state of Q is represented by the density operator p (possibly a projection),
then density operators pi = Tr2 p for Qi and p2 = Tr1 p for Q2 are obtained by
partial trace operations. The expectation values of every observable on Q i alone are
correctly predicted by pi , etc., but the subsystem density operators do not express
any of the correlations between the subsystems that may be present in the state p.
Even if Q were in a pure state, the separate states of Q i and Q2 might not be pure
states. In this situation, the entropy S[p] = 0 but S[pi] = S[p2] > 0.
We can now state Everett's conjecture: If A and B are observables on Qi
and Q2, respectively, then the mutual information H(A : B) obtained from the
distribution of outcomes of a joint measurement of A on Qi and B on Q2 must
satisfy
H(A: B) < S[pi) (11)
for i = 1,2. The mutual information H(A: B), called the "correlation information"
by Everett, is an information-theoretic measure of the degree of correlation between
observables A and B. Everett conjectured that the correlation between observables

on different subsystems was limited by the subsystem entropy.
This situation is of interest in the theory of measurement, which was its original
context. The two subsystems of Q might be an object system and a measuring
apparatus. After some interaction, the object and apparatus are in a correlated
state. Eq. 11 limits the possible degree of correlation between an object system
observable and a "pointer" observable of the apparatus. (It also seems likely that a
careful account of the thermodynamic cost of quantum measurement would make
use of this relationship between correlation and entropy.)
The proof is an easy one that makes use of the notion of relative states. For
each outcome a of the A-measurement on Qi, which occurs with probability p(a) =
Trirc,pi , it is possible to assign a relative state p2(a) to Q2. it is not hard to show
that
P2 = EP(a)P2(a) • (12)
That is, the measurement of A on subsystem Qi does not affect the statistics of any
measurement on Q2. (This is exactly the statement of locality for quantum measure-
ment theory; if it were not true, it would be possible to use quantum correlations
to send signals faster than the speed of light or into the past!)
Although there is no question of communication in this situation, there is a
formal similarity between quantum communication and quantum correlation. The
A-measurement outcomes correspond to the possible messages; for each "message"
a there is a "message probability" p(a) and a "signal state" p2(a) of the "channel"
Q2. A measurement of B on the channel provides an amount of information about
A limited by Kholevo's theorem:
H(A : 4'2] • (13)
A symmetric argument using the B-measurement outcome as the "message" and

Q1 as the "channel" proves the other half of the conjecture.
Since the machinery of Kholevo's theorem is so general, it is actually possible
to prove a stronger theorem than the one orgininally conjectured by Everett. The
moral remains unchanged: correlation, measured by mutual information, is limited
by subsystem entropy.
INFORMATION AND ENERGY

One of the most important questions of principle in the physics of information is
the question of energy. How much energy is required for the storage, transfer, or
processing of information? This question may be asked in two different ways. The
first form of the question is thermodynamic, asking how much energy is lost through
dissipation in handling information. The emerging answer, due to Landauer,6 Ben-

nett2 and others,11 is that only logically irreversible information functions neces-
sarily give rise to thermodynamic irreversibility. The erasure of information alone
has a thermodynamic cost.
The second form of the question is quantum mechanical and asks how much
energy must be employed to store, transfer, or process information. This energy
may be recoverable afterward, and thus the process may not involve thermodynamic
dissipation; nevertheless, an energy investment could be necessary in the first place.
How much energy is necessary for the physical representation of information in a
quantum system? In other words, "How much does information weigh?"
This question is a quantum mechanical because it is quantum mechanics that
leads us to suppose that information has any "weight" at all. Imagine a microwave
transmission system, which uses radio-frequency electromagnetic waves to trans-
fer information from one point to another. In the absence of external and internal
sources of noise, and with a receiver of unlimited sensitivity, one can imagine turning
the transmitter power to an extremely low level with no significant loss of informa-
tion capacity. Eventually, however, the power level becomes so low that only a very
few photons are produced by the transmitter. Turning the power still lower would
seriously limit the information capacity of the microwave link. It seems plausible
from this example that a quantum limit exists for the power requirements of a
communication channel.
To get an idea of what sort of quantum limits could exist, consider a simple
heuristic argument. Suppose a quantum channel can convey an amount of informa-
tion H in a period of time T with an average signal energy E. Dimensional analysis
suggest a quantum limit of the form
ET > f(H), (14)

h
where f is some increasing function, so that more information requires more energy
and/or time to transmit. If the function f is of the form f(H)= cHm for positive
constants c and m, it is easy to see that the exponent m < 2. A long signal can be
constructed out of a sequence of shorter signals, each obeying this quantum limit.
If m > 2, then it is possible to violate the limit eventually. Interestingly, limits of
this form with m = 1 and with m = 2 have both been proposed and discussed in
some detail.1,8 The m = 2 case is particularly interesting, since it can be rewritten
P > cR2 , (15)
where P is the signal power and R is the information transmission rate. In other
words, the power requirement increases as the square of the information rate.
Kholevo's theorem sheds light on these questions by providing a limit for the
amount of accessible information that may be represented by the state of a partic-
ular quantum channel. Thus,
1. The information conveyed by the quantum channel Q is bounded by the entropy

S[p] of the ensemble of possible signals.
2. For a given average energy (E), the greatest possible entropy is that of the
canonical ensemble—i.e., the thermal density operator p(T) for the particular
temperature T corresponding to (E). Thus, S[p] < S[p(T)].
3. S[p] = S[p(T)] is possible if the message statistics and coding are chosen so that
the signals are just energy eigenstates with a thermal probability distribution.
In this case H(X) = S[p].
4. If the receiver choses to measure the energy of the channel, then he will receive
an amount of information H(X : E) = S[p(T)].
In other words, for a given average signal energy, the most efficient possible coding
is in energy eigenstates with a thermal distribution; the information coded in this
scheme is equal to the entropy of the thermal ensemble and is completely accessible
to the receiver. No possible coding-decoding scheme can convey more information
with the same average energy.
Thus, the question of the connection between information and energy can be
converted to a question of the connection between the entropy and energy of sys-
tems in thermal equilibrium. This connection between information and entropy is
frequently assumed implicitly, but its validity depends on step (1) above, which
requires Kholevo's theorem. The connection also involves a subtlety. It is in fact
possible to code far more information in a system than the limit suggested above;
what Kholevo's theorem limits is the amount of this information that can actually
be retrieved. It is not the number of available signals but rather their distinguisha-
bility that matters in communication.
ACKNOWLEDGMENTS
This paper is drawn from the Ph.D. thesis work I have done under the direction
of Prof. John A. Wheeler at the University of Texas, and I would like to acknowl-
edge his continuing help and inspiration. In addition, I am greatly indebted to Bill
Wootters, Charles Bennett, Carlton Caves, Murray Gell-Mann, Leonid Khalfin, and
Wojciech Zurek for their comments and suggestions. I also wish to thank the Santa
Fe Institute for hospitality and support during the workshop.
REFERENCES
1. Bekenstein, J. D. Phys. Rev. D23 (1981):287 if.
2. Bennett, C. H. "The Thermodynamics of Computation—a Review." Intl. J.
Theor. Phys. 21 (1982):905-940.
3. Busch, P. Intl. J. Theor. Phys. 24 (1985):63-91.
4. Everett, Hugh, III, "The Theory of The Universal Wave Function." In The
Many-Worlds Interpretation of Quantum Mechanics, edited by DeWitt, Bryce
S. and Graham, Neill. Princeton: Princeton University Press, 1973,3-137.
The conjecture is found on page 51.
5. Kholevo, A. S. "Bounds for The Quantity of Information Transmitted by
a Quantum Communication Channel." Problemy Peredachi Informatsii 9
(1973):3-11. This journal is translated by IEEE under the title Problems of
Information Transfer.
6. Landauer, R. IBM J. Research 3 (1961):183-191.
7. Maassen, H., and J. B. M. Uffink. "Generalized Entropic Uncertainty Rela-
tions." Phys. Rev. Lett. 60 (1988):1103-1106.
8. Pendry, J. B. J. Phys. A16 (1983):2161 ff.
9. Shannon, C. E. Bell System Technical Journal 27 (1948):379, 623.
10. Wiesner, S. SIGACT News 15 (1983):78-88.
11. Zurek, W. H., "Reversibility and Stability of Information Processing Sys-
tems." Phys. Rev. Lett. 53 (1984):391-394.
William K. Wootters
Santa Fe Institute, 1120 Canyon Road, Santa Fe, New Mexico 87501; Center for Nonlin-
ear Studies and Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM
87545; Permanent address: Department of Physics, Williams College, Williamstown, Mas-
sachusetts 01267
Local Accessibility of Quantum States
It is well known that quantum mechanics is non-local in the sense that

the correlations it allows between spatially separated systems are stronger
than those allowed by any local hidden variable theory.1 In this chapter I
show that in a different sense, quantum mechanics is a more local theory
than one might have expected. The sense is this: If one has an ensemble of
identical quantum systems, each consisting of spatially separated parts—for
example, one might have an ensemble of pairs of correlated photons—then
the quantum state of this ensemble can always be ascertained by making
only local measurements on the various parts and using the correlations
among these measurements. One never has to make a measurement that
would require bringing different parts of the system together to interact
before measuring them. Moreover, as we shall see, there is a sense in which
quantum mechanics makes optimal use of the information one obtains from
such local measurements. Throughout this chapter, we will be concerned
only with systems having a finite number of orthogonal states, although it
appears likely that the results extend to continuous systems as well.

40 William K. Wootters
POLARIZATION STATE OF AN ENSEMBLE OF PHOTONS

To make these ideas clear, it is helpful to consider first how one determines the state
when the system does not consist of several parts. Let us begin by considering one
of the simplest possible systems, namely, the polarization of a photon. We suppose
that we are presented with a beam of photons whose polarization we are trying
to determine. The polarization is described by a 2 x 2 density matrix, and such a
matrix requires three independent real numbers for its complete specification. For
example, one could take the three numbers to be the angle of orientation of the
polarization, the eccentricity of the polarization, which will typically be elliptical,
and the purity of the polarization.
A typical polarization measurement on a single photon is a yes-no measurement;
for example, one can let the photon impinge on a polarizing filter and observe
whether it gets through or not. (A more practical method would be to use a Nicol
prism, but the idea of a filter is conceptually useful here.) By performing such a
measurement on a large portion of the beam, one obtains a good estimate of the
probability that a photon will get through, and this probability can be used as one
of the three numbers needed to determine the density matrix. In order to get three
independent numbers, one uses three different filters, each selecting for a different
polarization, and in each case one measures the probability of passing. For example,
the first filter might select for vertical polarization, the second for linear polarization
at 45° to the vertical, and the third for right-hand circular polarization. Of course,
each of these filters must be applied to a different sample of photons. One scheme
for performing the complete measurement is shown in Figure 1. Here a three-way
beam-splitter directs each photon at random towards one of the three filters.
FIGURE 1 A scheme for determining the polarization of a beam of photons.

Local Accessibility of Quantum States 41
We show now that the three probabilities obtained in this way are sufficient
to determine the density matrix. Let PI , P2i and P3 be projection operators that
project onto the states selected by the three filters. Then the probability that a pho-
ton will pass through filter i is pi = tr(P03), where k is the beam's density matrix.
The trace tr(Pik) can be thought of as an inner product on the space of Hermitian
operators, so that we can think of pi as the length of the projection of 13 along Pi .
We also know that the quantity tr(//3) where / is the identity matrix, is equal to
unity. We thus have the projections of p along four "vectors," namely, P1, P2, P3,
and I. The space of 2 x 2 Hermitian matrices is four-dimensional. Therefore, as long
as these four "vectors" are linearly independent, the four projections will uniquely
determine the density matrix. One can verify that for the three filters mentioned
above, the three Pi's and I are indeed linearly independent. On the other hand,
it would not do to use three linearly polarizing filters oriented at different angles,
since the associated projectors, together with the identity, do not constitute a set
of linearly independent matrices.
POLARIZATION STATE OF AN ENSEMBLE OF PHOTON PAIRS

We now move on to consider a system consisting of two parts, namely, a pair
of photons. We suppose we are given an ensemble of such pairs. Perhaps they
were produced by an atomic cascade as in many of the experiments that have
been performed to test quantum mechanics against Bell's inequality.2 For ease of
visualization let us imagine that each pair consists of one photon moving to the
right and another moving to the left. Suppose we want to determine the density
matrix describing the polarization state of the photon pairs. The system has four
orthogonal states, so the density matrix is a 4 x 4 matrix. Such a density matrix
requires 15 independent real numbers for its specification. (The general formula for
a system with N orthogonal states is N2 — 1.) Thus we need to find a scheme for
obtaining this many independent numbers.
Let us consider the most naive scheme one can imagine: Perform on the ensem-
ble of right-moving photons precisely the same measurements that we described in
the previous section, as illustrated in Figure 1, and do the same for the left-moving
photons. The whole scheme is illustrated in Figure 2. As before, for each filter we
can determine the probability that a photon hitting it will get through: But now
we can also obtain joint probabilities for photons from the two sides. For example,
we can measure the probability that when the right-moving photon encounters the
vertical filter and the left-moving photon encounters the 450 filter, they will both
get through.
FIGURE 2 A scheme for determining the polarization state of an ensemble of photon

pairs.
We now need to figure out how many independent probabilities one obtains
from this scheme. There are nine different joint measurements that can be made
on a photon pair, that is, nine possible combinations of filters. Each of these mea-
surements has four possible outcomes: yes-yes, yes-no, no-yes, and no-no, where
"yes" means that the photon passes through. Thus one obtains 9 x 4 = 36 different
probabilities. But of course these probabilities are not all independent. For each
measurement, the probabilities of the four outcomes must sum to unity. Moreover,
the unconditional probability that a photon on one side will pass through a given
filter cannot depend on which filter the corresponding photon on the other side
happens to encounter. Quantum mechanics forbids such dependence, and indeed
such a dependence could be used to send signals faster than light. Given these
restrictions, one can convince oneself that the following probabilities constitute a
complete set of independent probabilities, in the sense that all other probabilities
can be computed from them:
P(Ri) i= 1,2,3
P(Li) j= 1,2,3
P(Ri, Li) i,j= 1,2,3
Here p(Ri) is the overall probability that a right-moving photon encountering the
ith filter will pass through it, and p(Ri,L1) is the probability that a pair of photons
encountering filters i and j (filter i on the right and filter j on the left) will both
pass through. The number of independent probabilities is thus 3+3+9 = 15, which
is precisely the number we needed to determine the density matrix. Thus our naive
scheme appears promising.
However, it is not enough to check that one has the right number of logically
independent probabilities. Two probabilities that are logically independent might
still be related by some restriction within quantum mechanics itself, in which case
one of them is actually redundant and does not contribute to determining the
density matrix. To make sure we have enough information to determine the density
matrix, we need to go through the kind of argument we used in the case of single
photons. Let Pi, i = 1,2,3, be the projectors onto the states selected by the three
kinds of filter, just as before. Then the 15 probabilities listed above are related to
the density matrix by the following equations:
p(Rj) = tr[(iii 0) i)P)i .

p(Li) = tr[(I 0 Pi)P] .
AR:, L3) = trRA 0 P; )P]•
As before, these equations give us the projections of k, regarded as a vector in the
space of Hermitian matrices, along 15 different vectors. A sixteenth projection is
provided by the equation
1 = irKi 0 i))31
The space of 4 x 4 Hermitian matrices is 16-dimensional, so our 16 projections will
determine p- uniquely as long as the 16 matrices Pi ®I, /0 Pi , Pi 0 pi , and I 0/ are
linearly independent. We now invoke the following theorem: If {Ai}, i = 1, ... , N2,
is a set of linearly independent N x N complex matrices and {Bi}, j = 1, ... , M2,
is a set of linearly independent M x M complex matrices, then the matrices Ai 0
B, are also linearly independent. In our case both {Ai} and {B5} are the set
{Pi , P2, P3, 1}. Since these four matrices are linearly independent, that is, since the
scheme of Figure 1 is sufficient for determining the polarization of an ensemble of
single photons, the theorem guarantees that the scheme of Figure 2 is sufficient for
determining the polarization state of an ensemble of pairs of photons.
THE GENERAL CASE

Consider now a general system consisting of two subsystems, called the right-hand
part and the left-hand part. We imagine the two parts to be well separated in
space. Let these two parts have N and M orthogonal states respectively, so that the
whole system has NM orthogonal states. Suppose, as always, that we are trying to
determine the density matrix of an ensemble of these composite systems. As before,
we will do this by setting up separate apparatuses for each of the two subsystems.
The point I would like to make is this: Even if we design these apparatuses as if
we were trying to determine only the state of the ensemble of right-hand parts
alone and that of the left-hand parts alone, we can still determine the state of the
composite system by making use of the correlations among the outcomes.
For simplicity let us assume that the basic measurements which we will perform
on the subsystems are of the filter variety, that is, yes-no measurements. This is
not a necessary assumption but it makes the counting argument easier. Each of the
right-hand parts has N orthogonal states, so we will need to set up N2 —1 different
filters on the right-hand side. (Each filter provides 1 independent probability, and
there are N2 — 1 independent numbers in the density matrix.) Likewise, we will

have to set up M2 — 1 filters on the left-hand side. Our question now is, if we
perform these measurements jointly and observe the joint probability distributions,
do we get enough independent probabilities to determine the density matrix of the
composite system; that is, do we get (NM)2 — 1 independent probabilities?
As before, we can answer this question by listing a complete set of independent
probabilities: they are p(Ri), p(4), and p(Ri , 11), where the notation is the same
as before, and i = 1, , N2 — 1 and j = 1, , M2 — 1. The number of these
probabilities is (N 2 1) + (M2 1) + (N2 1)(M2 .•-• 1), which equals (NM)2 — 1,
and this is precisely the number that we need.
The argument can be continued just as before to show that the density matrix
can indeed be reconstructed from these numbers. Moreover, by adding more sub-
systems and iterating the above argument, one can extend the result to systems
containing arbitrarily many parts. In this way one arrives at the following gen-
eral conclusion about composite systems in quantum mechanics, which is the main
result of this paper: Any sets of measurements which are just sufficient for deter-
mining the states of the subsystems are, when performed jointly, also just sufficient
for determining the state of the combined system.
DISCUSSION
The above statement actually contains two noteworthy facts.
The first is that measurements on the parts are sufficient for determining the
state of the whole. This is not a trivial result. Indeed, the conclusion would not hold
if the world were run according to real-vector-space quantum mechanics rather than
complex quantum mechanics. To see this, consider a composite system consisting
of two parts, each having two orthogonal states. In real-vector-space quantum me-
chanics, we can think of this system as a pair of photons, where each photon is
allowed to have only linear, and not elliptical polarization. Such a restriction is the
result of allowing only real amplitudes. Let p be any density matrix for the compos-
ite system, that is, any 4 x 4 real symmetric matrix with unit trace and non-negative
eigenvalues. Then consider any other density matrix of the form = ory ),
where b is a real number and ay is the Pauli matrix (1? —0'). (For some A matrices,
every non-zero b will cause one of the eigenvalues of A' to be negative and is there-
fore not allowed. However, for a typical p there will be a range of allowed b's. It is
the latter case that we consider here.) I show now that the value of b cannot be
determined by any set of measurements performed on the subsystems: The prob-
abilities obtained from such measurements will .always be related to the density
matrix through an equation of the form p = tr[(P 0 Q)p^1, where P and Q are pro-
jectors on the two-dimensional spaces of the individual photons. It turns out that
trKP Q)(ay 0 o.y)] is always zero, so that these probabilities will never depend
on b, and therefore the value of b cannot be determined by such measurements.
Thus in real-vector-space quantum mechanics, one cannot in general determine the

density matrix using only local measurements.
The second noteworthy fact is that the number of independent probabilities one
obtains from the joint measurements is precisely the number needed to determine
the density matrix of the composite system. In this sense quantum mechanics uses
its information economically.
It is interesting to ask how special quantum mechanics is in this respect. Let
us therefore imagine some other universe based on laws different from quantum
mechanics, and ask what condition these laws must meet in order to have this
property of economy of information. We assume that we can still speak of the
number of real parameters needed to specify the state of a system, and that we can
classify systems according to the number of orthogonal states they have. (A system
has N orthogonal states if a non-degenerate measurement on the system has N
possible outcomes.) Let g(N) be the number of parameters needed for a system
with N orthogonal states. For quantum mechanics, g(N) = N 2 — 1. For classical
probability theory, g(N) = N — 1. (To specify the probability distribution for an
N-sided die requires N — 1 independent numbers.) The condition we find will be a
condition on g.
We now follow the argument that we used before when we considered general
two-component systems in quantum mechanics. With the same basic measurement
scheme, the number of independent probabilities one obtains from the joint mea-
surements is
9(N) + 9(M) + 9(N)9(M) -
If this number is to be exactly the number needed to specify the state of the
composite system, we must have
g(N) + g(M) + g(N)g(M)= g(NM).
The two cases mentioned above, namely, quantum mechanics with g(N) = N2 — 1
and classical probability theory with g(N) = N — 1 both satisfy this condition.
So does any hypothetical theory with g(N) = Nk — 1, where k is a non-negative
integer. Thus quantum mechanics is not unique in this respect, but its g(N) belongs
to a rather special class of functions.
Let me finish with a story. A man came across a beam of photons and decided
to measure their polarization. He made only those measurements that he needed to
make, but so as not to waste any information, he also recorded which photon gave
which result. (He identified them by their times of arrival.) Somewhere far away, a
woman came across a similar beam and performed the same procedure. Later, the
two observers met and were told by a third person that the photons which they had
observed were actually produced in pairs at a common source. On looking back at
their records, they discovered that they possessed precisely the information they
needed for reconstructing the polarization state of the photon pairs. They were
pleased, of course, but they also wondered what the meaning of this good fortune
might be.
ACKNOWLEDGMENTS
I would like to thank Ted Jacobson and Ben Schumacher for a number of ideas
that have found their way into the paper. I would also like to thank the two groups
in Los Alamos' Theoretical Division that have contributed to the support of this
work: Complex Systems T-13 and Theoretical Astrophysics T-6.
REFERENCES
1, Bell, J. S. "On the Einstein Podolsky Rosen Paradox." Physics 1 (1964):195.
2. Freedman, S. J., and J. F. Clauser. "Experimental Test of Local Hidden-
Variable Theories." Phys. Rev. Lett. 28 (1972):938.
V. F. Mukhanov
Santa Fe Institute, Santa Fe, NM, U.S.A.; permanent address: Institute for Nuclear
Research, Moscow 117312, U.S.S.R.
The Entropy of Black Holes
According to the famous Bekenstein-Hawking results,1,2,3A black holes have an

entropy which is a close analog of typical entropy. The main laws for the black
holes may be reformulated in exact correspondence with thermodynamical laws.
To define black hole entropy, thermodynamical arguments are usually used. But it
would be more attractive to have a statistical definition for this entropy, which is
directly connected with the notion of information.
From the point of view of an external observer, black holes are entirely char-
acterized by their masses, charges and angular momentums. However, black holes
with the same "external" parameters may have the different internal configurations,
because they may be formed in different ways. Therefore, to define their statistical
entropy, it is rather natural to consider all possibilities of creating a black hole with
fixed external parameters. For example, a black hole with given mass M may consist
of two radiation quanta with frequencies Wl and W2 , which satisfy the condition
W1 + W2 =M . (1)
(We will use the units in which c= h=G=k= 1.) Another possible way to
form a black hole with the same mass uses three radiation quanta with frequencies
M and 1474 + W2 + W3 = M), etc. If there are no restrictions on the
quanta frequencies, then the number of possible ways to form a black hole with

48 V. F. Mukhanov
given external parameters is infinite and correspondingly the statistical entropy of

the black hole must also be infinite. This is exactly the situation for classical black
holes. Therefore, to define statistical entropy, it is necessary to assume that black
holes are quantized.6 Let us suppose that the "external" parameters of a black
hole depend on some integer number n (the principle quantum number); that is,
the mass M, charge (1), angular moment S"/, and area A are some functions of the
integer argument
n : Mn = M(n), 7, 41(n), fin E fl(n), An A(n)
where n = 1, 2, 3, .... This assumption may be considered as a possible consequence

of unknown quantum theory of gravitation, but in the absence of such theory we
may only guess at its consequences.
Black hole entropy is due to loss of information about the structure of mat-
ter after this matter falls into a black hole. Because of the existence of an event
horizon, it is impossible to obtain any information about the internal structure of
such a black hole and consequently we lose all information about the properties
of the matter from which the black hole was formed. If a black hole is quantized,
then the number of different ways to form it from the given matter is finite. For
example, one possible way is first to form a small black hole with quantum number
"one" using a small amount of matter. After that we may add some small (but
finite) amounts of matter to this black hole to obtain a black hole with quantum
n-1
I
•%
4.5•
•
• .0
•
/
•••
2 •
/
• I /
1
FIGURE 1 The different ways to form a black hole at level n.

The Entropy of Black Holes 49
3=1+1+1
3=1+2
— 3=2+1
3=3
3
/
•••0
2
.10
1 •••
1-(3)=4=2"
FIGURE 2 The demonstration of one-to-one correspondence between the different

ways to form a black hole at level n and the subdivision of integer number n into the
ordered sums in the case n = 3.
number "two," etc., up to level n. Another possible way to create a black hole with
quantum number n is to do it without intermediate transitions. The different ways
to form a black hole at the level n are depicted in Figure 1. There is a one-to-one
correspondence between the ways to form a black hole at level n and the subdivisions
of integer number n into ordered sums of integer numbers. For particular case n = 3
this correspondence is explained in Figure 2. Thus, the number of possible ways to
create a black hole at level n from the given matter, and consequently the number
of different internal configurations of this black hole, is equal to the number of
ordered subdivisions of integer number n. It is very easy to verify that this number
F(n) is equal to
F(n) = 2n -1 . (2)
Then the entropy of a black hole with quantum number n is
S = hi (n) = — 1)1n 2 . (3)

This result is rather attractive since the entropy is proportional to "elementary"
entropy (ln 2), which corresponds to one bit of information. As it follows from
Eq. (2), each level of the black hole is exponentially degenerate.
Using the relation3
1
S = — A + coast , (4)
4
which is certainly correct for the large black holes (n >> 1), we find the quantization
law at n 1:
= 41n2 x n. (5)
50 V. F. Mukhanov
The constant in Eq. (4) was chosen such that there is no black hole when n = 0.
Thus, we found that the area of quantized black hole is proportional to the integer
number of the Planck areas. The minimal possible increase of the black hole area
is AAmin = 41n 2 in full correspondence with Bekenstein's result.2
If the black hole is quantized, then it is rather natural to consider the Hawking
radiation as a result of spontaneous quantum jumps of this black hole from one level
to other ones. It is natural also to have the maximum probability for the transition
to the nearest level (n n — 1). As a result of such transition, the black hole emits
the quantum of some physical field with energy wn,n...1 = M(n) — M(n — 1), charge
q = cl,(n) — (I)(n — 1), and angular momentum t = 1.2(n) —11(n — 1). Using the first
law of the black hole thermodynamics
1
dM = —Tb • dA + 4>clQ + SldT , (6)
4 • h•
and taking into account the quantization rule (5) after substitutions
dM M(n — 1) — M(n) = —wn,n—i

dQ 4'(n — 1) — cl,(n) = —q ,
dT S1(n — 1) — S1(n) = —P,
dA r A(n —1) — A(n) = —41n 2
(the equality dF Pe. F(n — 1) — F(n) is true for n >> 1), we find that for large n
(n >> 1) the parameters of this quantum satisfy the condition
Wn,n-1 — PS2 = In 2Tb.h. (7)

which characterizes the typical quantum of Hawking radiation. The whole spectrum
is formed as a result of the numerous transitions of the black hole to different levels.
Thus, we see that, in principle, it is possible to reproduce it for large (n >> 1)
quantized black holes. Of course, there will be some difference between the black
body spectrum and the spectrum of a quantized black hole. In the latter case,
the spectrum consists of the lines, but this result is not surprising. (Compare, for
example, the radiation spectra for the classical and quantized hydrogen atoms at
large n (n» 1).)
The hypothesis about the different black hole levels is true only if the widths
of the levels are small compared to the distances between the nearest levels. To es-
timate it, let us consider for simplicity the nonrotating black hole without charge.
The reason for nonzero width of the levels is the interaction of the black hole
with the vacuum of the matter fields. This width is defined by an imaginary part of
The Entropy of Black Holes 51
FIGURE 3 The different levels for the nonrotating black hole without charge (0 =
= 0). In this case, An cc Mn oc n and Mn cc Viz. Each level has finite width
Wm = 76.Ma.n-I. If 7 << 1, then the hypothesis about different levels is justified.
the effective action. In the first approximation, the imaginary part is proportional
to squared curvature invariants. Thus, for the width of the level n we have
W, =13 (Cikimd kim + ...) d3x

r>r,
In 2 (8)
=7 82-Ain
where Cikim is the Weyl tensor, AMn,n-i is the distance between the levels n and
n - 1, and the coefficient y characterizes the relative width of the levels (if y < 1,
then Wn < AMn,n_1). See also Figure 3.
The lifetime of the black hole at the level n is
1
Tr, 0•0 . (9)
Wn
Then it is easy to estimate the mass loss of the black hole because of its evaporation
dM 2.871.h. 2.8In 2 1
Tn — 0702 m2 (10)
dt '""
Comparing this formula with the corresponding Hawking formulae we find that for
the massless scalar field -ysc. f = 1/30, increasing the coefficient -y because of the
other fields may not be so significant.5 Therefore, the hypothesis about black hole
levels is justified (at least for sufficiently large black holes which emit only massless
quanta).
It is worth noting one of the most interesting consequences of black hole quanti-
zation. The black hole cannot absorb the radiation quantum whose length is larger
than the black hole size, because of the finite distance between nearest levels.
52 V. F. Mukhanov
ACKNOWLEDGMENTS
I would like to thank W. Unruh and W. Zurek for useful conversations. This research
was supported in part by the Santa Fe Institute. I am also very grateful to Ronda
Butler-Villa for her technical assistance in preparing this document for publication.
REFERENCES
1. Bekenstein, J. D. Phys. Rev. D7 (1973):2333.
2. Bekenstein, J. D. Phys. Today 33(1) (1980):24.
3. Hawking, S. W. Nature 248 (1974):30.
4. Hawking, S. W. Comm. Math. Phys. 45 (1975):9.
5. Page, D. N. Phys. Rev. D13 (1976):198.
6. Zurek, W. H., and K. S. Thorne. Phys. Rev. Lett. 54(20) (1985):2171.
Shin Takagi
Department of Physics, Tohoku University, Sendai 980, Japan
Some Simple Consequences of the Loss of

Information in a Spacetime with a Horizon
One cannot see beyond the horizon, but on earth, one can still communicate
across it. When it comes to a spacetime horizon, such a communication is im-
possible and one necessarily loses information on that part of the spacetime that
is beyond the horizon. Under such circumstances, some unexpected consequences
could arise. The most striking consequence is discussed in the theory developed
by Wheeler,25 Bekenstein,1'2 Hawking,12 and many others" that a black hole is a
thermodynamic object, as discussed by other presenters at this workshop. A closely
related situation occurs when an atom is uniformly accelerated. The purpose of my
chapter is to briefly sketch this remarkable theoretical development that emerged
from the works of Fulling," Davies,' UnruhUnruhW.G.,22 and others,4,8,16 and
discuss its relationship with apparently unconnected subjects, thus elucidating a
modest but perhaps hitherto unsuspected network among some of the well-known
ideas in theoretical physics.

the Sciences of Complexity, vol. VIII, Ed. W. H. Zurek, Addison-Wesley, 1990 53
54 Shin Takagi
UNIFORM ACCELERATION
To begin, what is a uniformly accelerated observer? At each instant, a special frame
of reference (1,7, V, 7) can be chosen so that the observer is momentarily at rest with
respect to it. Suppose he moves according to the Galilean law of a falling body:
= 1gt 2, g=1=0. (1)
If this is the case at every instant, the observer is said to be uniformly accelerated,
with acceleration g. Described in the global frame of reference (t, x, y, z), his world
line is a hyperbola13,16
)2
x2 (c.)2 =
(2)
where c is the velocity of light. No signal beyond the null ray x = ct can reach him;
this ray acts as a spacetime horizon.
QUANTUM FIELD AND UNRUH EFFECT

Now consider a quantum field, say, the electromagnetic field. Suppose that the state
of the field is a vacuum such that there are no photons from the point of view of
an inertial observer. If the inertial observer carries with him an atom in its ground
state and observes it, nothing will happen; the atom will remain in the ground
state. What will happen if the uniformly accelerated observer carriers with him the
same atom? The remarkable result found by Unruh" is that the atom will jump up
to an excited state exactly at the same rate as if it were immersed in the thermal
bath of photons with the temperature
hg
T= (3)
2irckB •
(In the following discussion we use units such that c = h = kB = 1.) To be precise
the probability per unit time for the atom to make a transition from an energy level
E to another level E w (co may be positive or negative) is proportional to
4,2
ca
l ewir . (4)
F(w) =
Here 1/w is a kinematical factor, and 4/2 comes from the density of states. This
function satisfies the detailed-balance relation (or Kubo-Martin-Schwinger condi-
tion)
F(w) = e —"/T F(-42), (5)
Consequences of Loss of Information in Spacetime with Horizon 55
which guarantees that an ensemble of atoms carried by the uniformly accelerated

observer will reach the equilibrium population appropriate for temperature T. The
parameter T is thus recognized as the genuine thermodynamic temperature. This
is a surprising result. It is hard to explain, because it has emerged from very sim-
ple inputs, namely (a) the special relativity and (b) the quantum mechanics of a
harmonic oscillator. (The latter captures the essence of the quantum field theory
as used here.) A way to obtain some understanding, though not an explanation
perhaps, of such a fundamental result would be to look for its relationship with
other theoretical ideas. The following describes some relationships that have been
uncovered so far.
EPR CORRELATION
First, the phenomenon is a result of a kind of Einstein-Podolsky-Rosen correlation,1°
as has been noted by Unruh and Wald23 and others.21 As reformulated by Bohm,5
Bell,3 and others, EPR correlation manifests itself in a pure spin-state of a two-
particle system with each particle of spin 1/2. To define such a pure state, one
needs information on both of the particles. If, however, one looks at one of the
particles alone, its spin can be described equally well by an ensemble. Nevertheless,
the correlation between the spins of the two particles can be detected by measuring
the spins of these particles independently, even if the measurements occur at places
with space-like separation. Coming back to the present problem, the vacuum state
of the quantum field corresponds to a pure spin-state of the two-particle system.
To the spin of one of the particles correspond those degrees of freedom of the field
in the region z >1 t 1 (to be called region I), and to the spin of the other particle
correspond those degrees of freedom of the field in the region z < — 1 t 1 (to be
called region II). The definition of the vacuum requires information on the field
in both regions I and II. However, the uniformly accelerated observer "sees" only
those degrees of freedom in region I. Therefore, results of his measurements can be
described by an ensemble of states. Since regions I and II are space-like apart from
each other, this is a typical case of the underlying EPR correlation.
APPARENT INVERSION OF STATISTICS

The above consideration does not say anything about the nature of the ensemble.
But the result (4) indicates that it is a thermal ensemble of temperature T. Eq. (4)
is appropriate for the four-dimensional spacetime. Let us ask, just out of curiosity,
56 Shin Takagi
what the case would be in a fictitious three-dimensional spacetime. The answer to

this question is another surprise18,26:
1 ewir + 1 ,
F(w) = 4.7 (6)
where T is the same as in Eq. (3). Note that the density of states in the n-
dimensional Minkowski spacetime is proportional to con-1, which explains the nu-
merator. But in contrast to Eq. (4), the distribution function here is that of Fermi-
Dirac, although we are dealing with photons (i.e., bosons) in both cases. This result
has been confirmed by Unruh,24 and also found independently by Stephens.18 How
can one make sense of this apparently paradoxical result?
HUYGENS' PRINCIPLE
It is well known that Huygens' principle is valid only in even-dimensional space-
times.6 In a sense, this fact also relates to a loss of information, because an in-
habitant of the three-dimensional spacetime sees only a shadow (or projection) of
the four-dimensional spacetime in which it can be embedded. With some technical
preparation, such as the KMS condition and the fluctuation-dissipation relation,
one can show that this circumstance is closely related to the present problem of
the apparent inversion of statistics.14,21 But here I shall point out yet another
unsuspected connection.15
DENSITY OF STATES IN THE RINDLER WEDGE

The idea is to rewrite Eq. (6) as
1 D(w)
F (w) — w ewn, (7)
and try to interpret

ewIT 1 )
D(w) = ,T = tanh ( (8)
ewi + 1
as the density of states appropriate to the wedge-shaped region I of the spacetime,

which is usually referred to as the Rindler wedge. Such an interpretation is, at first
sight, awkward because a "temperature-dependent density of states" does not make
sense. But our temperature is related to the position e of the uniformly accelerated
observer at t = 0 as
1
T= . (9)
27r4"
(e = 1/g; see Eq. (2)). If we consider a family of uniformly accelerated observers
with various accelerations, we can associate a different temperature to each observer
according to Eq. (9). The world lines of this family of observers cover the entire
region I. Indeed, introducing the coordinates (77,e) instead of (t, x), where en is
the proper time of the observer whose position at t = 0 is e, one can convert the
spacetime metric
ds2 = dx2 4. 42 (10)
to the form
ds2 = ± ± dy2
In view of these considerations, one may be inclined to regard Eq. (8) as a "local
density of states" at e;
D(w) = w tanh(rwe) . (12)
Unfortunately, the spatial inhomogeneity of region I, as manifest in Eq. (11), pre-
vents one from defining the density of states per unit volume or a unique local
density of states.
OPEN EINSTEIN UNIVERSE

We can escape from this dilemma by regarding the photons as "living" not in the
spacetime with the metric (11), but in the conformally related spacetime with the
metric
ds2 de + dy2
d'12 = 4.2 = + (13)
In spite of its appearance, the spatial section of this metric is homogeneous. Indeed,
an appropriate coordinate transformation from (e, y) to (x, <p) gives
de2 dy2
42 — dx2 sinh2 xdW2 (14)
which is the metric of the two-hyperboloid. Eq. (13) thus describes the three-
dimensional open Einstein universe. Furthermore, an analytic continuation x
transforms Eq. (14) to
de2 sin2 042 (15)
that is, the metric of the two-sphere.
58 Shin Takagi
QUANTUM STATISTICAL MECHANICS OF A DUMBBELL

It is a standard result in statistical mechanics that the density of states can be ob-
tained by Laplace-transforming the partition function with respect to the tempera-
ture. Thus, from the quantum statistical mechanics of a particle on the two-sphere,
i.e., the statistical mechanics of a dumbbell, one obtains the density of states per
unit "volume" on the two-sphere, and that on the two-hyperboloid by an analytic
continuation. Note that the density of states per unit "volume" makes sense here
because the space is homogeneous. Carrying out the calculation,15 one finds
D(v) = v tanh(rv) , (16)
where v is the frequency with respect to the "time" q. Since the frequency w refers
to the proper time it is related to v by v = 4. Hence we recover Eqs. (6) and
(12). A corresponding calculation in the four-dimensional spacetime gives simply
D(v) = v2 (17)
and leads to the result (4). In an earlier publication,15 we have noted a connection
between Eq. (6) and the statistical mechanics of a dumbbell. Here I claim that the
former can be derived from the latter by sharpening the logic of Ref. 15.
CONCLUDING REMARKS
The last section was rather technical, but the essential point is that we were led
to consider the open Einstein universe because we gave up information beyond the
horizon and restricted ourselves to spacetime region I, i.e., the Rindler wedge. One
might hope that there may still be interesting connections yet to be discovered with
other ideas.
ACKNOWLEDGMENTS
I would like to thank the Yamada Science Promotion Foundation for a travel grant
and to the Santa Fe Institute for financial support covering local expenses, which
enabled me to attend this workshop. I also express my sincere thanks for the hos-
pitality of the Santa Fe Institute and Wojciech Zurek.
REFERENCES
1. Bekenstein, J. D. "Black Holes and Entropy." Phys. Rev. D 7 (1973):2333-
2346.
2. Bekenstein, J. D. "Black-Hole Thermodynamics." Physics Today January
(1980):24-31.
3. Bell, J. S. "On the Einstein-Podolsky-Rosen Paradox." Physics 1 (1964):195-
200.
4. Birrell, N. D., and P. C. W. Davies. Quantum Fields in Curved Space. Cam-
bridge: Cambridge University Press, 1982.
5. Bohm, D. Quantum Theory. Englewood Cliffs, NJ:Prentice-Hall, 1951.
6. Courant, R., and D. Hilbert. Methods of Mathematical Physics, vol. II. New
York: Interscience Publishers, 1962.
7. Davies, P. C. W. "Scalar particle Production in Schwarzschild and Rindler
Metrics." J. Phys. A8 (1975):609-616.
8. DeWitt, B. S. "Quantum Field Theory in Curved Space." Physics Rep. 19
(1975):295-357.
9. DeWitt, B. S. "Quantum Gravity: The New Synthesis." In General Relativity,
edited by S. W. Hawking and W. Israel. Cambridge: Cambridge University
Press, 1979.
10. Einstein, A., B. Podolsky, and N. Rosen. "Can Quantum-Mechanical Descrip-
tion of Physical Reality be Considered Complete?" Phys. Rev. 47 (1935):777-
780.
11. Fulling, S. A. "Nonuniqueness of Canonical Field Quantization in Rieman-
nian Space-Time." Phys. Rev. D 7 (1973):2850-2862.
12. Hawking, S. W. "Particle Creation by Black Holes." Commun. Math. Phys.
43 (1975):199-220.
13. Misner, C. W., K. S. Thorne, and J. A. Wheeler. Gravitation. San Francisco:
Freeman, 1973.
14. Ooguri, II. "Spectrum of Hawking Radiation and the Huygens Principle."
Phys. Rev. D 33 (1986):3573-3580.
15. Ottewill, A., and S. Takagi. "Particle Detector Response for Thermal States
in Static Space-Times." Prog. Theor. Phys. 77 (1987):310-321.
16. Rindler, W. Essential Relativity, 2nd edition. New York:Springer-Verlag,
1977.
17. Sciama, W., P. Candelas, and D. Deutsch. "Quantum Field Theory, Horizons,
and Thermodynamics." Adv. in Phys. 30 (1981):327-366.
18. Stephens, C. R. Private communication.
19. Takagi, S. "On the Response of a Rindler Particle Detector." Prog. Theor.
Phys. 72 (1984):505-512.
20. Takagi, S. "On the Response of a Rindler Particle Detector II." Prog. Theor.
Phys. 74 (1985):142-151.
21. Takagi, S. "Vacuum Noise and Stress Induced by Uniform Acceleration."
Prog. Theor. Phys. Suppl. 88 (1986):1-142.
60 Shin Takagi
22. Unruh, W. G. "Notes on Black-Hole Evaporation." Phys. Rev. D. 14 (1976):

870-892.
23. Unruh, W. G., and R. M. Wald. "What Happens When an Accelerating Ob-
server Detects a Rindler Particle." Phys. Rev. D 29 (1984):1047-1056.
24. Unruh, W. G. "Accelerated Monopole Detector in Odd Spacetime Dimen-
sions." Phys. Rev. D 34 (1986):1222-1223.
25. Wheeler, J. A. Quoted by Bekenstein in Ref. 1.
P. C. W. Davies
Department of Physics, The University of Adelaide, South Australia 5001
Why is the Physical World so

Comprehensible?
A vexing scientific mystery of longstanding concerns the peculiar conjunction

of simplicity and complexity that pervades the universe. We believe that the un-
derlying laws of physics are simple in form, yet the actual states of the world are
highly complex. It is only in recent years that any sort of general understanding of
the source of this complexity has emerged.
The most striking feature of many complex systems is their non-random nature.
The universe is populated by distinct classes of recognizable things: galaxies, stars,
crystals, clouds, bacteria, people. Given the limitless variety of ways in which matter
and energy can arrange themselves, almost all of which would be "random," the fact
that the physical world is a coherent collection of mutually tolerant, quasi-stable
entities is surely a key scientific fact in need of explanation.
The non-random nature of cosmic complexity is captured by the concept of
organization, or, to use a more fashionable word, depth 5,22 According to the best
cosmological theories, the universe began in an exceedingly simple state. Indeed,
the initial state might well have been essentially smooth empty space. It is hard
to think of anything more "shallow." All the depth that has arisen in the universe
is the result of a sequence of self-organizing and self-complexifying processes that
have occurred since the initial bang. The epithet "creation" in connection with
the big bang seems a serious misnomer, since almost all the creative activity that

62 P. C. W. Davies
has generated the richness and variety of the present state occurred after the big
bang.L11
The seemingly unidirectional advance of complex organization, or depth, im-
poses on the universe an arrow of time, which is related to, but distinct from, that
due to the second law of thermodynamics. Some people have perceived an element
of paradox in the growth of organization in a universe in which entropy always
rises. True, the former arrow does challenge the spirit of the second law, which
predicts continual degeneration. But there is no conflict with the letter of the law.
Self-organization costs entropy. But whereas entropy is a measure of information
loss, organization (or depth) refers instead to the quality of information. Entropy
and depth are not each other's negatives.
Among the more interesting complex organized systems to have arisen thus
is the human brain. Containing as it does an internal representation of the phys-
ical world, the brain stands in an unusual relationship with the world. And here
the conjunction of simplicity and complexity is inverted: the brain is incredibly
complex, but the mental states that it supports make the world seem deceptively
simple. We are able to function as human beings because our mental model of the
world bestows upon it a coherent unity. When we talk about "understanding" some
aspect of nature, we mean slotting the phenomena associated therewith into our
existing mental model of "how things are out there."
Is this process of understanding a surprise? Does it tell us anything significant
about the structure of the brain, or the world, or both? Many people have puzzled
about such issues. Why is the universe knowable? After all, given the enormous
complexity and interconnectedness of the physical world, how can we know anything
without knowing everything? Indeed, how can we know anything at all?
As a starting point in addressing these tough questions, let us agree at least on
the following statements:
n There exists a real external world which contains certain regularities. These
regularities can be understood, at least in part, by a process of rational enquiry
called the scientific method.
n Science is not merely a game or charade. Its results capture, however imper-
fectly, some aspect of reality. Thus these regularities are real properties of the
physical universe and not just human inventions or delusions.
In making these assumptions one has to eschew extreme idealistic philosophies,
such as those in which the mind somehow imposes the regularities on the world in
order to make sense of it. Unless one accepts that the regularities are in some sense
objectively real, one might as well stop doing science.
As science progresses, so some regularities become systematized as laws, or
deductions from them. At this epoch the laws found in our textbooks image only
imperfectly the actual regularities. Two points of view can be detected among
practicing scientists regarding the ontological status of these laws. The first is that
there exist "real" laws, or "the correct set" of laws, to which our current theories
iliFor a nonspecialist review, see Davies.?

Why is the Physical World so Comprehensible? 63
are only an approximation. As science progresses so we converge upon the "true"

laws of the universe, which are regarded as eternal, timeless, and transcendent of
the physical states.
By contrast, some scientists deny that there are any "true" laws "out there,"
existing independently of scientific enquiry. What we call laws, they maintain, are
simply our attempts to cope with the world by ordering our experiences in a system-
atic way. The only laws are our laws, and they are to be judged solely on utilitarian
grounds, i.e., they are neither true nor false, but merely more or less useful to us.
My impression is that many scientists who practice what one might loosely call ap-
plied science incline to the latter philosophy, while those engaged in "fundamental"
research, for example, on quantum cosmology or the unification program, adopt the
former position.
The issue of whether the laws of nature are discovered or invented is sidestepped
if we view the world algorithmically.6 The existence of regularities may be expressed
by saying that the world is algorithmically compressible.25 Given some data set, the
job of the scientist is to find a suitable compression, which expresses the causal
linkages involved. For example, the positions of the planets in the solar system over
some interval constitute a compressible data set, because Newton's laws may be
used to link these positions at all times to the positions (and velocities) at some
initial time. In this case Newton's laws supply the necessary algorithm to achieve
the compression.
Viewed this way, the question "Why is the universe knowable?" reduces to
"Why is the universe algorithmically compressible?" and "Why are human beings
so adept at discovering the compressions?"
First it should be noted that our mental model of the world is itself an algo-
rithmic compression. If the world were not compressible in this way, there could be
no cognition. So the very fact that we exist as observers already constrains the uni-
verse to have the property of algorithmic compressibility. Of course, this anthropic
reasoning does not constitute an explanation of why the universe is compressible;
it merely tells us that we could not be around to debate the issue were it not so.
(There is another anthropic connection here. The existence of biological organisms
implies depth. But an algorithmically incompressible universe would necessarily be
shallow.)
Secondly, there is a wide class of physical systems, the so-called chaotic ones,
which are not algorithmically compressible [21 One can imagine a universe in which
there are no regularities at all, only chaos. The fact that there is cosmos rather
than chaos is the starting point of science. The existence of non-chaotic dynamical
regimes is a profound fact about nature, and one can ask for an explanation of this
fact. I don't know whether we will ever have such an explanation, or what sort of
explanation it might be. But one possible strand of reasoning might run thus: The
non-chaotic nature of many systems seems to hinge on their approximate linearity
and this, in turn, depends on the smallness of certain coupling constants, radiative
corrections, etc. At present we do not have a theory of coupling constants, but
[2J See, for example, Ford" for a discussion of this point.
64 P. C. W. Davies
one may emerge from attempts at grand, or super, unification. If a satisfactory

unified theory is found for which these constants are fixed, then that will constitute
a partial answer as to why chaos is kept at bay.
Merely avoiding chaos in the technical sense is a necessary, but not sufficient,
condition for practical algorithmic compression. One can imagine a world in which
the relevant algorithms are impenetrably complicated, too complicated to be dis-
covered by systematic enquiry, or even too complicated to be tested in the age of
the universe by any conceivable computational system. Part of the reason for the
apparent simplicity of the laws of physics rests with the key property of locality.
If everything in the universe interacted with everything else in a highly non-local
manner, we could not untangle all the components to discover the algorithms. That
is, we could never know something without knowing everything.
In fact, the study of quantum cosmology compels us to address the question of
how quasi-locality has been established in the universe.15 One expects the quantum
state of the universe which emerged from the Planck era to describe unlocalized
matter and energy. The quasi-locality that we now observe, in which material ob-
jects and the cosmological scale factor are described by sharply peaked wave packets
is, according to some, an extremely unusual state. The explanation for this state
is evidently to be sought by appeal to a very special initial quantum state of the
universe, combined with arguments about environmental decoherence.18 The point
that I wish to make here is that algorithmic compressibility in principle is one thing,
but our ability to actually effect the compression is quite another, and likely to be
possible only in a universe in which strong constraints are placed on, for example,
the initial quantum state. Without special choice of quantum state, it is very likely
that we could not know anything without knowing everything.
This brings me on to the question of why, even in a quasi-local universe, humans
are so good at discovering nature's algorithms. I consider this to be a sharper version
of Wigner's famous question about the "unreasonable effectiveness" of mathematics
in the physical sciences.27 Mathematics emerges from the highest form of human
mental activity (which is possibly the deepest known complex process) yet it finds
ready application to the external world at its most basic level. More specifically,
the fundamental laws of physics seem to be expressible as succinct mathematical
statements. But these statements are precisely the algorithms which compress the
data of experience. Hence we are really dealing with our "unreasonable" ability to
spot those algorithms. Again, does this fact tell us something important about the
structure of the brain, or the physical world, or both?
Let me express this point in a somewhat novel way. Hawking has claimed that
"the end of theoretical physics may be in sight."2° He refers to the promising
progress made in unification, and the possibility that a "theory of everything"
might be around the corner. Although many physicists flatly reject this, it may
nevertheless be correct. As Feynman has remarked, we cannot go on making dis-
coveries in physics at the present rate for ever. Either the subject will bog down in
seemingly limitless complexity and/or difficulty, or it will be completed.11
Suppose that the latter optimistic view is correct, and suppose further that the
superstring theory or something like it emerges in a few decades as a satisfactory
"theory of everything."8 Then it will be the case that a very limited period of
mathematical development (300 or 3000 years, depending on where you start) will
have proved sufficient to encapsulate the ultimate laws of the cosmos. But this raises
the curious question of why such a glittering prize, so sweeping in its explanatory
power, demands a nontrivial, yet so astonishingly limited, amount of mathematics.
One can imagine a world in which the principles are transparent to us all at a
glance, or another world in which the principles are impenetrably complicated and
subtle. Given the limitless amount of mathematics which could (and maybe will) be
developed in the (possibly infinite) future, isn't it remarkable that one could have
all of fundamental physics wrapped up with so modest a mathematical investment?
Given that the world does require some subtle and sophisticated mathematics to
describe it, why is it (relatively!) so easy for us to achieve this unifying description?
There is another aspect to this point. Again, assuming a "theory of everything"
is within our grasp, why is it that the requisite mathematics is achievable by the
(severely limited) human brain using an education span that is less than a typical
human life span? I confess I find this exceedingly odd. The learning capabilities of
the brain, and the length of the human life span, are both dictated by Darwinian
criteria, and (presumably) have no connection whatever with the mathematical
form of the fundamental laws of the cosmos.
It is often said that, because the brain is a physical system (i.e., part of the
physical world), it is no surprise that it reflects so efficiently the workings of that
world, i.e., that it generates just that mathematics which express the very laws
of physics that govern its own activity. I consider this to be an entirely erroneous
argument, based on a confusion of conceptual levels (a muddle between hardware
and software). As I have discussed this in detail elsewhere,9 I shall here restrict
myself to a new development that has a bearing on this issue, namely, the question
of computability in physical law.
Most mathematicians subscribe to the so-called Church-Turing hypothesis,
which is to say that a Turing machine, or universal computer, can perform any
computable mathematical operation. In other words, if a mathematical problem is
solvable, a Turing machine can solve it (so long as there is no restriction on the
available memory storage space). This is usually regarded as telling us something
about the foundations of mathematics or logic, but as David Deutsch has pointed
out, it also tells us something about the physical world.° To perform its modest
repertoire of operations, a Turing machine must employ the laws of mechanics. If
the laws of the physical universe were very different, then some operations that
are computable in our universe might no longer be. Conversely, certain operations
which are non-computable in our universe might be computable in a hypothetical
universe with different laws.
Deutsch expresses it thus":
The reason why we find it possible to construct, say, electronic calculators,

and indeed why we can perform mental arithmetic, cannot be found in
mathematics or logic. The reason is that the laws of physics 'happen to'
permit the existence of physical models for the operations of arithmetic such
66 P. C. W. Davies
as addition, subtraction and multiplication. If they did not, these familiar

operations would be non-computable functions.
We are so used to the fact that simple arithmetic works in daily life that we
take its efficacy completely for granted. Yet the world does not have to be that way.
The mathematician R. W. Hamming writes16:
I have tried, with little success, to get some of my friends to understand my

amazement that the abstraction of integers for counting is both possible and
useful. Is it not remarkable that 6 sheep plus 7 sheep make 13 sheep; that 6
stones plus 7 stones make 13 stones? Is it not a miracle that the universe is
so constructed that such a simple abstraction as a number is possible? To
me this is one of the strongest examples of the unreasonable effectiveness
of mathematics. Indeed, I find it both strange and unexplainable.
We seem to have encountered a logical loop here. The laws of physics define the
allowed mechanical operations that occur in the physical universe, and thence the
possible activities of a Turing machine. These mechanical operations thus determine
which mathematical operations are computable and define for us what might be
called simple solvable mathematics (like addition). For some reason, those same
laws of physics can be expressed in terms of this simple mathematics. There is thus
a self-consistency in that the laws generate the very mathematics that makes those
laws both computable and simple.
One can now ask whether this is the only self-consistent loop. Is it possible
that there could exist a world with very different laws, in which, say, arithmetic
could not be performed, but in which some other set of mathematical operations,
non-computable in our world, were not only computable, but also described those
different laws in simple terms?
If there is only one self-consistent loop, then it implies there is only one pos-
sible computable universe. But, suppose that other loops are possible. What, if
anything, is special about our universe? Might ours be the only universe that is
both computable and cognizable? (Cognizability might demand both computability
and depth, i.e., a certain level of organized complexity.) Alternatively, our universe
might represent maximum potential variety in some sense. Smolin1,24 has suggested
that the logical structure of the universe might be such as to generate the richest va-
riety of organized forms, echoing Leibniz's "best of all possible worlds." Yet another
possibility, suggested by Barrow,3 is that the laws of physics might be "optimally
encoded" in the Shannon information-theoretic sense, rendering them relatively
robust to the filter of observation.
All these speculations hinge on the assumption that the laws of physics do
indeed involve computable mathematics, and that the brain is a Turing machine.
But some writers have questioned both these assumptions. Geroch and Hartle"
have investigated the possibility that at least some physical processes might involve
non-computable mathematical descriptions. They point out that certain procedures
for calculating path integrals in quantum gravity involve non-computable sums over
topologies. Recently, Penrose23 has suggested that the human brain has capabilities
over and above those of a Turing machine, because humans are able to discover the
existence of true mathematical statements that no Turing machine can prove. He
claims that this ability can be traced to the influence of quantum mechanics on
brain processes. (A conventional Turing machine is a classical system.) If either of
these conjectures is correct, it would add a subtle new twist to the question of why
the universe is comprehensible to us.
There is a further tacit assumption running through all these arguments, which
is that the laws of physics are timeless eternal truths. But the intimate relationship
between physics and computation which is emerging from such studies challenges
that assumption. If nature can be viewed as a computational process ("the universe
is a computer" according to Fredkin13), then the form of the physical laws might
be constrained by what can be in principle computed. This point has been made
by Landauer.21 One then has to address the question of the computational limits
of the cosmos. If something cannot be computed by the entire universe during the
age of the universe, in what sense can it be said to be computable?
Might this imply that the laws of physics somehow "fade away" as one goes
back towards the initial singularity, on account of the fact that the computational
power of the universe tends to zero as t 0? Such a possibility has been suggested
by Lloyd and Pagels.M If so, then the laws of physics, along with the state of the
universe, would evolve with time. The laws would somehow emerge from the big
bang, and gradually "congeal" into their timeless form. Such a speculation is not
new, of course; one of its more eloquent proponents is John Wheeler.[41
To make this more concrete, I should like to point out that one may obtain
a natural measure of the information capacity of the cosmos using the Hawking-
Bekenstein formula for black hole entropy 4'19 If the entire universe were converted
into a black hole, it would conceal a quantity of information L given by
GM3
Iu ~
hc
where Mu is the mass of the observable universe (i.e., within the particle horizon).
At the current epoch t., 10120. At epoch t
II(t) Pe. 101" (le-)2
so that at the Planck time, tp 10 -438,44 P.-. 1, as expected. The Bekenstein-

Hawking information is the maximum possible information capacity for a system
of mass M.
The above formula places a bound on the cosmological equivalent of the amount
of "blank tape" available to a Turing machine and suggests that, at the Planck
PISee Lloyd and Pagels,22 final section.

141 See, for example, Wheeler.26
68 P. C. W. Davies
epoch tp, effective computation must cease. Mathematical operations associated

with laws of physics would not be implementable at all. Does this imply that the
laws of physics are meaningful only for t >> tp?
The memory capacity of the universe is only part of the story. There will be a
further bound due to the finite information processing rate. This will be somewhat
model-dependent. In some cases, such as certain matter-filled Friedmann models,
the computational power goes up as the singularity is approached. (Barrow and
Tipler have made a study of this.2) On the other hand, a universe which starts out
as a gravitationally smooth vacuum would seem unpromising as a computational
device. Evidently some universe models might admit sharply defined laws at all
times, others not. It should be emphasized that models which require a "law of
initial conditions" (e.g., Hartle-Hawking12) make no sense except in the context of
timeless eternal laws.
On the philosophical side, there is an urgent need for these speculations to
be placed in the context of a theory of mathematics. For example, Platonists be-
lieve that mathematics enjoys a timeless, independent existence. It is already "out
there," and mathematicians merely discover it. This has been the position adopted
by Grodel, Penrose, and many theoretical physicists who work on fundamental prob-
lems. Thus in quantum cosmology, for example, one writes down the Wheeler-
DeWitt differential equation or the Hartle-Hawking path integral without ques-
tioning whether the mathematical operations involved can actually be performed;
mathematics is assumed to "already exist." Furthermore, it is often speculated that
there exist as-yet-undiscovered laws that involve mathematics which has also yet to
be "discovered." One often hears this sentiment expressed in connection with the
superstring theory.8
By contrast, others (including many computer scientists) reject Platonism and
subscribe instead to formalism. According to this philosophical position, mathe-
matics is not discovered; it is invented by mathematicians. Mathematical opera-
tions consist of nothing more or less than mappings of one set of symbols into
another; in computer parlance, mapping bit strings into bit strings. The idea that
there exist, timelessly, physical laws expressed in terms of unknown, and possibly
uncomputable, mathematics is rejected as meaningless.
It has to be concluded that, at this time, the answer to the question "Why
is the universe knowable?" is unknown. The dazzling power of mathematics to
describe the world at a basic level continues to baffle us. The ability of a subset
of the universe (the brain) to construct an internal representation of the whole,
including an understanding of that basic level, remains an enigma. Yet a comparison
of computational and natural complexity surely provides a clear signpost for the
elucidation of these age-old mysteries.
REFERENCES
1. Barbour, J. B. "Maximal Variety as a New Fundamental Principle of Dynam-
ics." Found. Phys. 19 (1989):1051.
2. Barrow, J. D., and F. J. Tipler. The Anthropic Cosmological Principle. Ox-
ford: Oxford University Press, 1986, section 10.6.
3. Barrow, J. D. The World Within the World. Oxford: Oxford University Press,
1988, 292.
4. Bekenstein, J. D. "Black Holes and Entropy." Phys. Rev. D 7 (1973):2333.
5. Bennett, C. H. "On the Nature and Origin of Complexity in Discrete, Homo-
geneous, Locally Interacting Systems." Found. Phys. 16 (1986):585.
6. Chaitin, G. J. Algorithmic Information Theory. Cambridge: Cambridge Uni-
versity Press, 1987.
7. Davies, P. C. W. The Cosmic Blueprint. London/New York: Heinemann/
Simon & Schuster, 1988.
8. Davies, P. C. W., and J. R. Brown, eds. Superstrings: A Theory of Every-
thing? Cambridge: Cambridge University Press, 1988.
9. Davies, P. C. W. "Why is the Universe Knowable?" Science and Mathemat-
ics, edited by R. Mickens. Singapore: World Scientific, 1990.
10. Deutsch, D. "Quantum Theory, the Church-Turing Principle and the Univer-
sal Quantum Computer." Proc. Royal Soc. Lond. A 400 (1985):97.
11. Feynman, R. P. The Character of Physical Law. London: BBC Publications,
1965, 172.
12. Ford, J. "What is Chaos, that We Should Be Mindful of It." The New
Physics, edited by P. C. W. Davies. Cambridge: Cambridge University Press,
1989, 348.
13. Fredkin, E. This volume.
14. Geroch, R., and J. B. Hartle. "Computability and Physical Theories." Found.
Phys. 16 (1986).
15. Halliwell, J.J. "Information Dissipation in Quantum Cosmology and the
Emergence of Classical Spacetime," this volume, and the reviews cited
therein.
16. Hamming, R. W. "The Unreasonable Effectiveness of Mathematics." Amer.
Math. Monthly 87 (1980):81.
17. Hartle, J. B., and S. W. Hawking. "The Wave Function of the Universe."
Phys. Rev. D 28 (1983):2960.
18. Hartle, J. B. "Excess Baggage." Talk given at the 60th Birthday Celebration
for Murray Gell-Mann, Pasadena, Jan. 27, 1989.
19. Hawking, S.W. "Particle Creation by Black Holes." Commun. Math. Phys.
43 (1975):199.
20. Hawking, S. W. "Is the End in Sight for Theoretical Physics?" Inaugural Lec-
ture for the Lucasian Chair, University of Cambridge, 1979.
21. Landauer, R. "Wanted: A Physically Possible Theory of Physics." IEEE
Spectrum 4 (1967):105.
70 P. C. W. Davies
22. Lloyd, S., and H. Pagels. "Complexity as Thermodynamic Depth." Rocker-

feller University preprint, 1989.
23. Penrose, R. The Emperor's New Mind. Oxford: Oxford University Press,
1989.
24. Smolin, L. "Space and Time in the Quantum Universe." Conceptual Problems
in Quantum Gravity, edited by A. Ashtekar and J. Stachel. Boston:
Birkhauser, 1989.
25. Solomonoff, R. J. "A Formal Theory of Inductive Inference. Part I." Infor.
Control 7 (1964):1.
26. Wheeler, J. A. "On Recognizing 'Law without Law.'" Amer. J. Phys. 51
(1983):398.
27. Wigner, E. "The Unreasonable Effectiveness of Mathematics in the Natural
Sciences." Comm. Pure Appl. Math. 13 (1960):1.
II Laws of Physics and Laws of
Computation
W. H. Zurek
Theoretical Division, T-6, MS B288, Los Alamos National Laboratory, Los Alamos, New
Mexico 87545 and the Santa Fe Institute, 1120 Canyon Road, Santa Fe, New Mexico
87501
Algorithmic Information Content,

Church-Turing Thesis, Physical Entropy, and
Maxwell's Demon
Measurement converts alternative possibilities of its potential outcomes

into the definiteness of the "record" — data describing the actual outcome.
The resulting decrease of statistical entropy has been, since the inception
of the Maxwell's Demon, regarded as a threat to the second law of ther-
modynamics. For, when the statistical entropy is employed as the measure
of the useful work which can be extracted from the system, its decrease by
the information gathering actions of the observer would lead one to believe
that, at least from the observers viewpoint, the second law can be violated.
I show that the decrease of ignorance does not necessarily lead to the low-
ering of disorder of the measured physical system. Measurements can only
convert uncertainty (quantified by the statistical entropy) into randomness
of the outcome (given by the algorithmic information content of the data).
The ability to extract useful work is measured by physical entropy, which
is equal to the sum of these two measures of disorder. So defined physical
entropy is, on the average, constant in course of the measurements carried
out by the observer on an equilibrium system.

74 W. H. Zurek
1. INTRODUCTION
Algorithmic information content (also known as algorithmic randomness) of a phys-
ical entity is given by the size, in bits, of the most concise message (e.g., of the
shortest program for a universal computer) which describes that entity with the
requisite accuracy. Regular systems can be specified by means of concise descrip-
tions. Therefore, algorithmic information content can be regarded as a measure of
disorder.
Algorithmic randomness is defined without a recourse to probabilities. It pro-
vides an alternative to the usual ensemble measures of disorder: it quantifies ran-
domness of the known features of the state of the physical system. I shall demon-
strate that it is indispensable in formulating thermodynamics from the viewpoint
of the information gathering and using system ("IGUS")—a Maxwell's demon-like
entity capable of performing measurements and of modifying its strategies (for
example, for extraction of useful work) on the basis of the outcomes of the mea-
surements. Such an IGUS can be regarded as a "complex adaptive system." The
aim of this paper is to review the concept of the algorithmic information content
in the context of statistical mechanics and discuss its recently discovered physical
applications.
2. OVERVIEW
Algorithmic randomness, an alternative measure of the information capacity of a
specific physical or mathematical object, was independently introduced in the mid-
60's by Solomonoff,22 Kolmogorov,13 and Chaitin.5 It is based on an intuitively
appealing idea that the information content is equal to the size, in bits, of the
shortest description. Formalization of this idea will be briefly described in the next
section: In its developement, it draws on the theory of algorithms and in the process
makes use of the theory of computation,7,19 establishes a firm and useful connection
with Shannon's theory of information,12,20 and benefits from its implications for
coding."
Applications of the algorithmic measures of the information content were ini-
tially mostly mathematical in nature. More recently, Bennett, in an influential
paper,1 has pointed out that the average algorithmic entropy of a thermodynamic
ensemble has the same value as its statistical (ensemble) entropy and, consequently,
one could attempt to build a consistent thermodynamics on an algorithmic foun-
dation.
I have applied algorithmic randomness to the problem of measurement as seen
by an observer 25,26,27
Following a measurement, an observer (IGUS) is in possession of a specific
record. From its intrinsic point of view, this record is quite definite. Therefore, fur-
ther analysis of the measurement in terms of the ensemble language is pointless.
Algorithmic Information Content, Church-Turing Thesis, and Physical Entropy 75
Rather, the observer must deal with the specific measurement outcomes and with
their implications for extraction of useful work. In this "Maxwell's demon" context,
algorithmic randomness assumes, in part, a function analogous to the Boltzmann-
Gibbs-Shannon entropy: From the observer's point of view, the second law of ther-
modynamics must be formulated by taking into account both the remaining igno-
rance (measurements are typically far from exhaustive) and the randomness in the
already available data. Thus, the physical entropy, the quantity which allows for the
formulation of thermodynamics from the viewpoint of the observer, must consist of
two contributions:
(physical (remaining ( algorithmic

entropy) ignorance) randomness
Or
S(p) = H(p) + K(p) , (2.1)
where p is the density matrix of the system;
H(p) = —Tr Plog2 P (2.2)
is its BGS entropy, and K(p) is the size of the shortest program capable of describing
P:
K (P) Psp • (2.3)
This recent proposal for physical entropy will be described in more detail in sections
3 and 4.
In section 4, I shall also discuss the importance of "compression" of the acquired
data to their most concise form: thermodynamic efficiency of an IGUS-operated en-
gine depends on its ability to find concise measurement descriptions. In turn, this
efficiency can be regarded as a consequence of the IGUS's ability to "understand"
or "model" the part of the Universe employed as the "engine" in terms of the regu-
larities which can be regarded as analogs of physical laws. In this sense, "intellectual
capabilities" of an IGUS are quite critical for its "success."
3. ALGORITHMIC RANDOMNESS AS A MEASURE OF

PHYSICAL DISORDER
Consider a one-dimensional lattice of spins. I shall suppose that each spin can point
either up (and designate this by "1") or down ("0"). A sequence of 0's and l's with
the number of digits equal to the number of spins in the system can be then regarded
as a "description" of its state. Consider two such descriptions:
01010101010101010101 (3.1)
76 W. H. Zurek
and
10110100101100010111. (3.2)
The first system is "regular": It can be simply and concisely described as 10 "Ors."
There is no equally concise description of the second spin system. To reconstruct
it, one would have to have a "verbatim" description (Eq. 3.2); there is no way to
"compactify" this description into a more concise message.
The concept of algorithmic information content (known also as the algorithmic
randomness, algorithmic complexity, or algorithmic entropy) captures this intuitive
difference between the "regular" and "random" binary sequences.
Algorithmic information content of a binary sequence s is defined as the size of
the minimal program, sip which computes sequence s on the universal computer
U:
Ku(s) = I Su I • (3.3)
Above, vertical lines indicate the size of the binary string in bits. It is important to
note that this definition of the algorithmic information content makes it explicitly
subjective in the sense that it is computer dependent; hence, the subscript U in
Eq. (3.3). However, by the very definition of the universal computer, programs
executable on U can be also executed (and will yield the same output) on any other
universal computer U, provided that they are preceded by a prefix Tut', which
depends on U and U, but not on the program. Hence, algorithmic randomness
defined for two different computers will differ by at most the size of the prefix111:
I Ku(s)— KUs (S) 151 Trjrp . (3.4)
Such differences by a constant are usually ignored in mathematically motivated

considerations, as they become relatively unimportant in the limit of very long
strings. Moreover, as we shall see in section 4, differences of algorithmic complexities
(rather than its absolute value) are relevant in computations of engine efficiencies.
This further reduces—but does not eliminate—the thermodynamic importance of
the subjectivity.
It is worth noting that a binary string, or a physical system, which appears
to be random at first sight may nevertheless have a concise description. Binary
representations of r, v, etc. are examples of such apparently random but, in
fact, algorithmically regular binary strings. This difficulty in assessing a system's
algorithmic complexity is related to Giidel's undecidability.5,6,9,24,25 In spite of it,
the distribution of the algorithmic randomness in physically or mathematically
constructed ensembles, its value for whole classes of strings, and many of its more
paradoxical (and intriguing) properties can be discussed in a rigorous manner. In
particular, one can demonstrate that only exponentially few binary strings are
algorithmically simple.5,5
Ill Note that here we have used the vertical lines in two different ways: on the left-hand side they
stand for the absolute value while on the right-hand they indicate the size of the binary string in
bits. Only this second meaning will be employed below.
/
/
/ I //
-- /o///0/7/q///o/7/o/7,/o/7/16/
FIGURE 1 Turing machine T uses a set of instructions residing inside its "central
unit" as well as the input it reads in by means of the "head" scanning the input tape
to modify the content of the tape. A Universal Turing machine U can simulate any
other Turing machine by reading on the input tape the "description" of T. In particular,
a single-tape U can simulate operations of the modern computers, which can be
modelled as "polycephalic" Turing machines with access to several multidimensional
tapes and other kinds of hardware. Such "modern" machines (one of which is illustrated
above) may be more convenient to use, but their capabilities are limited to the
same range of "computable" tasks as for the original, one-tape U. This universality
justifies the importance attached to the universal computers. In particular, it limits the
subjectivity of the algorithmic information content defined by means of the minimal
program, Eq. (3.3).
The distribution of the number of strings of a fixed length as a function of their

algorithmic randomness is given by:
N(K) = 2K . (3.5)
This implies that "cryptoregular" strings (i.e., algorithmically simple but "hard to
decipher" binary sequences like those associated with w, -4, etc.) are rare and a
typical binary string s is algorithmically random and cannot be compressed:
K(s) -z.: I s I. (3.6)
In particular, a typical natural number n is algorithmically random. Hence, to the
leading order,
K(n) :-.. log2 n . (3.7)
78 W. H. Zurek
As in the "usual" information theory, one can define algorithmic randomness

of sets of strings. For example, joint algorithmic randomness of a pair (s, t) is given
by the minimal program which outputs s followed by t on the output tape. K(s, t)
satisfies the commuting equality
K (s,t) = K(t, s) (3.8)

as well as the usual inequality
K(s) + K(t). (3.9)

In the equations above, I have used "approximate" ;5_ ) rather than "exact"
(=, <) signs to describe relationships; this is because most of the equations of
the algorithmic information theory are subject to "0(1)" corrections which, in
most cases, can be traced to the subjectivity in the definition of the algorithmic
information content associated with the choice of the universal computer U.
Conditional algorithmic information content K(s I t) is given by the size of
the smallest program 4 which computes string s from t (or, equivalently, both s
and t from t). By analogy with Shannon's information theory, one would expect a
relation:
K(s t) = K(s,t)— K(t). (3.10)
This equality does indeed hold, but only with logarithmic corrections.
It is convenient, for the reasons which will soon become apparent, to impose
certain requirements on the properties of minimal programs. In particular, it is
convenient to demand that they be self-delimiting; that is, that they must contain
within them the information about the size of the input—their own size. This
demand brings algorithmic information theory into a close correspondence with
coding theoryl° where such codes are known as prefix or prefix-free codes and are
used to guarantee the unique decodability of messages: It can be demonstrated
that every uniquely decodable encoding can also be made self-delimiting without
changing sizes of the words (programs) associated with different messages (outputs).
This correspondence between communication theory and computation (which can
be regarded as "decoding" of programs by means of universal computers) puts
the powerful mathematical formalism of the coding theory at the disposal of the
algorithmic information theory.
The most important consequence of the requirement of self-delimiting minimal
programs is the ability to associate binary sequences with the probability that they
will be generated by a randomly obtained program for the universal computer U:
any random sequence of bits obtained, for instance, by coin flipping, can be used
as an input to the computer U. To be a "legal" input program i, this bit sequence
must be self-delimiting which, in practice, means that U will stop reading additional
bits—stop asking for further flips of the coin—after a finite number of bits were
supplied. Hence, the probability of generating this sequence is equal to:
p(i) = rill. (3.11)

FIGURE 2 Lexicographic tree establishes a correspondence between binary strings

and natural numbers. An algorithm which computes a certain string s from the input
can be regarded as a Turing machine which "decodes" the input. The prescription for
decoding is contained in the description of the computer.
Probability associated with the output o will be given by the sum over all inputs
which yields the given output o:
p(o) = E 2-1101 (3.12)

{iltr(i)=0}
Dominant contribution in this sum will come from the shortest program. Hence,
K(o) = loge p(o) . (3.13)
This connection between the probability that a given string will be generated
by a random input and its algorithmic information content can be employed in
proving that the average algorithmic randomness of a member of a simply described
("thermodynamic") ensemble is almost identical to its Boltzmann-Gibbs-Shannon
entropy.1,4,26 The relevant double inequality
Hap(si)}) 5 EP(si)K(si) < H({P(si)}) + K (Isi,P(si)}) + 0( 1) (3.14)
has been demonstrated by Levin16,17 and Chaitin5 (see, also, Bennett3 for a more
accessible presentation and Caves4 for a discussion in the physical context). Above,
K(si, {p(si)}) is the size of the minimal description of the ensemble. Benne& has
80 W. H. Zurek
pointed out that, in the case when the BGS entropy is large compared with the size
of the ensemble description:
H({p(si)}) = — p(si) log2 p(si) >> K(si, {p(si)}) , (3.15)
{3.}
one could base thermodynamic formalism on the average algorithmic randomness
of the ensemble.
In a recent paper I have considered an application of algorithmic randomness
to the situation in which an observer attempts to extract maximum useful work
from the system on the basis of partial measurements.26 In the next section I shall
discuss this situation which forces one to consider physical entropy defined as the
sum of the remaining ignorance H({p(si)}) and of the cost of storage of the available
information K(si, {P(si)))-
The last quantity which can be defined in the algorithmic context is the al-
gorithmic information distance given by the sum of the conditional information
contents:
A(s, t) = K(s I t) + K(t s) . (3.16)
Algorithmic information distance satisfies the requirements expected of a metric.25
In addition to the "simple" distance defined by Eq. (3.16), one can consider
several related quantities. For example,
K(s!t!u) = K(s I t, u) + K (t I u, s) + K (u s, t) (3.17)
is also positive, reflexive, and satisfies the obvious generalization of triangle inequal-
ity. Hence, K(s!t!u) and its further generalizations involving more strings can be
regarded as direct extensions of A(s,t) = K(s!t).
It is sometimes useful to express distance as the difference between the joint
and mutual information content
(s, t) = K(s, t) — K(s : , (3.18)
where the mutual information is given by
K(s : = K(s,t) — (K(s) + K(t)) . (3.19)
The quantity A' defined by Eq. (3.18) differs from the "original" distance in Eq.
(3.16) by logarithmic terms because of the similar logarithmic errors entering into
Eq. (3.10). The advantage of employing Eq. (3.18) is its intuitive appeal: The dis-
tance between two binary strings is the information which they contain but do not
share.
Mutual information can also be used to define algorithmic independence of two
strings: s and t are independent when K(s : t) is small; for example,
K(s : t) < min (K(s), K(t)) .
Information distance can also be defined for statistical (that is, BGS) entropy.
In this case, A and A' coincide. Indeed, information distance was independently
discovered in the domain of the Shannon's information theory by at least three
authors before it was discussed (again without the benefit of knowledge of these
references) by this author' in the algorithmic context.
4. MAXWELL'S DEMON AS AN ADAPTIVE COMPLEX SYSTEM

A. MAXWELL'S DEMON AND THE CHURCH-TURING THESIS
Smoluchowski21 (see also Feynmans for an accessible up-to-date treatment) ruled

out the possibility that a simple automated device (such as a trapdoor) can be
used to violate the second law of thermodynamics; he has shown that the thermal
fluctuations which inevitably occur whenever a trapdoor is coupled to the heat
reservoir make it "jitter," which in turn makes the selection process it is supposed
to implement completely unreliable. Thus, it cannot be sufficiently "selective" about
its actions to create and maintain temperature or pressure differences in a closed
system. Smoluchowski has, however, raised the question of whether an "intelligent
being" employed as a "demon" could be more successful than the simple trapdoor.
Until now, no definite answer to this question could be given. (Although Szilard23
did write a celebrated and influential article, the title of which suggested that
it would address the question of the second law and "intelligent beings." While
intelligent beings never really enter into Szilard's considerations, it raises the issue
of the relationship between thermodynamic entropy and information, and I shall
make extensive use of its approach below.)
The reason for this state of affairs is easy to explain. After all, how can one
analyze actions of an "intelligent being" within the realm of physics? In the first
place, the very idea sounds "sa,creligious." More importantly, a physically sensible
definition of what is an "intelligent being" was not easy to come by, especially in
the times of Szilard.
In the intervening years, a compelling mathematical definition of an intelli-
gent being became available. It is suggested by the so-called Church's thesis (or
the Church-Turing thesis, or the Church-Tarski-Turing thesis, see Hofstadterll and
Penrose's for an accessible discussion). In simple terms, it states that "What is hu-
man computable is universal computer computable." Thus, it equates information-
processing capabilities of a human being (or of any intelligent being) with the
"intellectual capacities" of a universal Turing machine. This may appear to be a
gross oversimplification. Indeed, Church's thesis (CT) might be misunderstood as a
claim that "all brains are created equal." For the time being, we shall only note that
no counterexamples to CT have been discovered (but see Penrose18 for a critical
assesment of the extent to which computers may be able to simulate "conscious-
ness"). Moreover, while with Penrose and others one might be concerned about the
ability to simulate consciousness or the validity of CT for the "right-hemisphere"
functions, it is harder to find a reason to question CT with regard to the logical and
mathematical operations usually associated with the "left hemisphere" of the brain.
Below, in the thermodynamic considerations, I shall need only the uncontroversial
"left hemisphere" part of the thesis.
82 W. H. Zurek
B. SZILARD'S ENGINE AND THE COST OF ERASURE
The connection between the mathematical model of an "intelligent being" and ther-
modynamics goes back to the above-mentioned paper by Szilard.26 In the analysis
of the famous one-gas-particle engine (Figure 3), Szilard concluded that the second
law could be indeed violated by a fairly simple "demon" unless the cost of mea-
surements is no less than kBT per bit of acquired information. Further, essential
clarification of the situation is due to the recent work by Bennett1'2 who, basing his
discussion on the earlier considerations of Landauer16,16 on the costs of information
erasure, concluded that it is the "resetting" of the measuring apparatus which is
thermodynamically expensive and must be responsible for restoring the validity of
the second law in Szilard's engine. (Indeed, this observation was anticipated, if only
in a somewhat half-hearted manner, by Szilard.26
Algorithmic randomness proved essential in attempts to generalize this discus-
sion of Maxwell's demon 4,28,29,3° The validity of the original experiment about the
cost of erasure was limited to the context of Szilard's engine. In that case, the out-
come of the measurement can always be described by a single bit. Hence, the gain
of useful work in the course of the expansion is given by
bag+ = kBTIog2 {171

V- 2 = kBT (4.1)
Note that above we are using Boltzmann constant kB which differs from the usual
one by a factor of In 2. This distinction reflects the difference between entropy
measured in "bits" and "nats."
This gain of useful work is "paid for" by AW— of the energy needed to restore
the memory part of the "brain" of the "demon" to the "blank," "ready to measure"
state
OW' = —kBT. (4.2)
,r•Malo 4111.4•=11•. • • ••• OMMIIII•
minolmammor
0 •
• AI\ %%III
FIGURE 3 Szilard's engine employs one-molecule gas in contact with a heat bath
at temperature T to extract N T In 2 of work per cycle (which is illustrated in a self-
explanatory manner above). The measurement which establishes the location of the
molecule is crucial. The importance of the cost of erasure for the proper accounting for
the net energy gain is discussed in the text.
Algorithmic Information Content, Church•Turing Thesis, and Physical Entropy 83
Therefore, the net gain of work is at best nil;
AW=AW++AW- =0. (4.3)
It is, nevertheless, far from clear how to apply this "cost of erasure" argument to
less idealized and more realistic situations.
One simple (although not very realistic) generalization is to consider a sequence
of measurements on the Szilard's engine and to postpone the "erasure" indefinitely.
This requires a demon with a significant memory size. One can then, as noted by
Bennett,I,2 use Szilard's engine to extract kBT of work per cycle as long as there is
"empty" tape. This is, of course, only an apparent violation of the second law since
the empty tape can be regarded as a zero-entropy (and, hence, zero-temperature)
reservoir. Consequently, an ideally efficient engine can, in accord with the second
law and, in particular, with the Carnot efficiency formula, attain exactly kBT of
work per cycle.
The cost of erasure does not have to be paid for as long as the "memory tape" is
available. However, for this very reason, the process is not truly cyclic: the demon's
memory is never restored to the initial "blank" state. The gain of useful work is paid
for by the "clutter" in its "brain." If the outcomes of consecutive measurements
are random, getting rid of this clutter would cost kBT per bit, and all the apparent
gain of work would have to be "paid back" by the final costs of erasure.
C. COMPRESSIBILITY OF INFORMATION: SECOND LAW AND CODING

THEORY
Consecutive "bits" in the demon's memory may have been inscribed with 0's and l's
in a regular fashion. For example, a situation of this kind would automatically arise
in the "multicylinder" Szilard engine shown in Figure 4. There the recording can
be made by simply writing "0" or "1" on the squares of the tape corresponding to
the empty or full cylinders. The gain of useful work extractable from the n-cylinder
engine in contact with the heat bath at temperature T is
AW+ = kBT log2 n . (4.4)
Each measurement results in filling up n blanks of the tape with 0's and l's. Hence,
the cost of erasure would be
ATV- = kBT • n . (4.5)
Again, one could postpone erasures indefinitely and just "dump" all of the "clut-
tered-up" tape into the garbage can. In the final count, however, the cost of erasure
(linear in n) would outweigh the gain of useful work (which is logarithmic in n).
84 W. H. Zurek
1 2 3 4 5 6 7 8 9
•
Tape Supply
Compressed Record
rn
Turing Machine
1 1 1
FIGURE 4 "Multicylinder" Szilard's engine. A single gas molecule is enclosed in a

cylinder with n partitions, each of them with the same volume AV. The engine cycle
begins with a measurement to determine which of the positions contains the molecule.
Expansion following a measurement yields — In(V/AV) of useful work per cycle. The
cost of erasure and, consequently, the efficiency of the engine depends on the manner
in which the information is recorded. The most concise record (the size of which is
given by the algorithmic information content) will result in the maximum efficiency. Thus,
algorithmic information content inescapably enters the "demon's own" formulation of
thermodynamics.
A more efficient multicylinder engine would be achieved by a fairly obvious but

more economical encoding. Instead of making a "direct" record of the cylinders'
content, one could "compress" such "verbose" messages and keep in the memory
only the label of the "occupied" engine cell. It is now easy to verify that both
W+ and AW— are logarithmic in n (providing that one insists, as one must, on
the unique decodability of the label). Hence, a computerized "demon" employing
this second, more economical way of encoding measurement results would be able
to come much closer to the "break even point" than was the case for the operating
system of the multicylinder Szilard engine considered before.
The general question raised by this example concerns the "ultimate compress-
ibility" of a set of measurement outcomes. In particular, in the context of the
example considered here, one could imagine that there is a still more efficient en-
coding of the alternative outcomes which results in a still more concise average size
of the record. Then the demon would be able to do more than just break even, as
the cost of erasure would be only partially offset by the gain of useful work. In this
sense, the ability to compress records representing measurement outcomes deter-
mines the efficiency of a thermodynamic process whenever information acquisition
and processing are involved. Indeed, the second law could be violated unless there
is a limit on the compressibility which relates the average size (AK) of the record
with the decreased statistical entropy AH of the measured system via an inequality
AH < (AK) . (4.6)
For, unless this inequality holds, the gain of useful work which is equal to
AW+ = kBTAH (4.7)
could exceed the cost of erasure which is given by
(OW-) = -kBT(AK). (4.8)
Hence, the net average gain of useful energy per cycle would be
(OW) = kBT(AH - (AK)). (4.9)
The second law demands that (OW) < 0, which leads to the inequality (4.6).
Fortunately, this inequality is indeed respected: It is an immediate consequence
of the left-hand side of the inequality in Eq. (3.14). Indeed, it follows from the first
basic result of Shannon's theory of communication (the so-called noiseless channel
coding theorem; see Shannon and Weaver,20 Khintchin,12 Hamming,1° and Caves4
for discussion): The average size of minimal "descriptions" needed to unambigu-
ously describe measurement outcomes cannot be made smaller than the statisti-
cal entropy of the "source" of information (in our case, of the measured physical
system).noiseless channel coding theorem In this context, the second law can be
regarded as a direct consequence of the Kraft inequalityl° which plays a basic role
in the coding theory and thus enters physics25,26: Suppose that {K1} are sizes (in
the number of bits) of distinct symbols (programs) {si} which correspond to dif-
ferent signals (measurement outcomes). Then one can prove that in order for the
encoding to be uniquely decodable, the following inequality must be obeyed:
E2-K. < 1. (4.10)
The inequality (4.6) follows from Kraft inequality (4.10) since it can be immediately
rewritten as
(pi(si)2-K•
log2 E
P(Si) )
where p(si) are the probabilities corresponding to signals (states) si. Now, employ-
ing convexity of the logarithm, one can write
Ep(si)10g2 Ptsil 2.,p(si)Ki

•
(4.11)
which establishes the desired result, Eq. (4.6).

86 W. H. Zurek
D. FROM IGNORANCE TO DISORDER: THE ROLE OF MEASUREMENTS

The ability of living organisms to perform measurements and "profit" by exploiting
their outcomes can be analyzed in the algorithmic terms of the above discussion.
Measurements decrease ignorance about the specific state of the system, but in-
crease the size of the record necessary to encode the acquired information. This
process is illustrated in Figure 5(a) for an equilibrium ensemble: The approximate
equality of the increase of the size of the record and of the decrease of ignorance
is guaranteed by the coding theorem. To approach the optimal coding efficiency
(AK) = AH, the demon can assign to different outcomes symbols with length Ki
given by
= [log2 (11p(si))1 . (4.12)
Here the symbol [al signifies the smallest natural number that is > a. It is easy to
see that with this assignment, Kraft inequality [Eq. (4.11)] is automatically satisfied.
Therefore, there exists a uniquely decodable "labeling" with the word length given
by Eq. (4.13). Moreover,
< log2 11p(si) + 1.
Therefore,
(AK) < AH + 1. (4.13)
Thus, the encoding satisfying Eq. (4.12) (and known as Shannon-Fano coding1°)
is quite efficient in that it exceeds the absolute lower limit on the minimal average
size of the description given by (AK) = AH by no more than one bit.
Indeed, Shannon-Fano coding can suggest a near-optimal "labeling" strategy for
a demon dealing with an unknown ensemble: The demon could perform sufficiently
many measurements to estimate probabilities p(si) from the frequencies of the dif-
ferent outcomes, and then adopt short descriptions for the more frequent outcomes
and longer descriptions for the rare ones in accordance with the Shannon-Fano
prescription [Eq. (4.12)]. For a demon dealing with an equilibrium system (Figure
5(a)), it would guarantee an (almost) optimal performance: The demon would lose
no more than kBT of useful work (corresponding to the "extra" bit) per measure-
ment. [The demon could "break even" if the probabilities p(si) were exactly equal
to the inverse powers of 2.] Of course, no useful work could be extracted—Shannon's
noiseless channel coding theorem guarantees the validity of the second law.
To restate the content of Figure 5(a) one can regard measurements as a way
of turning statistical uncertainty into algorithmic randomness. Moreover, for equi-
librium ensembles, the two quantities are, on the average, changing in such a way
that their sum
S=H+K (4.14)
ENTROPY IN BIT
- NUMBER OF MEASUREMENTS -> - NUMBER OF MEASUREMENTS --+
FIGURE 5 The effect of measurements on (i) The Shannon entropy lid in presence
of the partial information—data d; (ii) The algorithmic information content of the data
K(d); and (iii) The physical entropy Sd F.-": Hd K(d), which measures the
net ammount of work that can be extracted from the system given the information
contained in the data d. (a) When the measurements are carried out on the equilibrium
ensemble, the randomness in the data increases at the rate given by the decrease of
ignorance. (b) For systems far from equilibrium the increase of randomness is smaller
than the decrease of ignorance, which allows the observer to extract useful work and
makes measurements energetically attractive.
remains approximately constant. Furthermore, as it was already discussed above,

the amount of useful work which can be extracted by a "demon" from a system is
given by
W = kBT(AH + AK) = kETA.S (4.15)
Therefore, from the point of view of the observer (the demon) evaluating its ability
to extract net useful work from the physical system, the hybrid quantity S plays
the role of entropy. I shall therefore refer to S as physical entropy.
It is important to emphasize that the actual record ri corresponding, in the
demon's memory, to the state si will rarely be minimal; the inequality
ri I> K(ri) (4.16)

will usually be satisfied. This is because the minimal programs cannot be found
in a systematic manner. Moreover, there is usually no incentive to look for the
truly minimal record. A reasonably short record, often with built-in redundancies
to allow for error-correction, can be far more useful. Consequently, the actual gain
of useful work will be given by
AW kBT(Alf +46. I r D• (4.17)

88 W. H. Zurek
One might be tempted even to use quantity H+ I r I as a measure of entropy. As

M. Gell-Mann emphasizes, this would be incorrect: H+ r I can be lowered by
computation at no thermodynamic expense. Hence, if H+ I r I was regarded as
entropy, the second law could be violated by reversible computation. Such violation
is impossible with S where the record part of S is already as short as it can be.
The second law is safe when formulated in terms of the physical entropy S.
Indeed, physical entropy S has the great advantage of removing the "illusion"
that entropy decreases in the course of measurement. Of course, the ignorance
(measured by H) does indeed decrease, but only at the expense of the increase
of the minimal record size K. Hence, in measurements performed on equilibrium
ensembles S = H K is, on the average, constant (Figure 5(a)).
By contrast, measurements performed on far-from-equilibrium systems can re-
sult in a decrease of ignorance which is much larger than the resulting increase in
the record size (Figure 5(b)). Fortunately, the Universe that we inhabit is precisely
such a nonequilibrium environment: It pays to measure. Moreover, it is not really
necessary to look for truly minimal programs: the possible gain of useful work out-
weighs even substantial inefficiencies in the record size optimization (as well as in
other more "traditional" thermodynamic inefficiencies). Nevertheless, the ability to
recognize in the measurement outcome the opportunity for such a gain of useful
work is essential for the well-being of IGUS's.
ACKNOWLEDGMENTS
I would like to thank Charles Bennett, Carl Caves, Stirling Colgate, Doyne
Farmer, Murray Gell-Mann, James Hartle, Rolf Landauer, Seth Lloyd, Bill Unruh,
and John Wheeler for stimulating and enjoyable discussions on the subject of this
paper. The warm hospitality of the Aspen Center for Physics, the Institute for
Theoretical Physics in Santa Barbara, and the Santa Fe Institute is very much
appreciated.
Algorithmic Information Content, Church Turing Thesis, and Physical Entropy 89
REFERENCES
1. Bennett, C. H. Int. J. Theor. Phys. 21 (1982):305-340.
2. Bennett, C. H. Sci. Am. 255(11) (1987):108-117.
3. Bennett, C. H. In The Universal Turing Machine-A Half-Century Survey,
edited by R. Herkin. Oxford: Oxford University Press, 1988.
4. Caves, C. M. This volume.
5. Chaitin, G. J. J. ACM 13 (1966):547-569.
6. Chaitin, G. J. Sci. Am. 232(5) (1975):47-52.
7. Davis, M. Computability and Unsolvability. New York: Dover, 1973.
8. Feynman, R. P., R. B. Leighton, and M. Sands. Feynman Lectures on Physics,
sect. 46, vol. 1. Reading, MA: Addison-Wesley, 1964.
9. Godel, K. Monat. Nacht. Mat. Phys. 38 (1931):173-198.
10. Hamming, R. W. Coding and Information Theory. Englewood Cliffs: Prentice-
Hall, 1987.
11. Hofstadter, D. Godel, Escher, Bach. New York: Random House, 1979.
12. Khinchin, A. I. Information Theory. New York: Dover, 1957.
13. Kolmogorov, A. N. Information Transmission 1 (1965)3-11.
14. Landauer, R. IBM J. Res. Dev. 3 (1961):113-131.
15. Landauer, B.. In Signal Processing, edited by S. Haykin. New York: Prentice-
Hall, 1989,18-47.
16. Levin, L. A. Dokl. Akad Nauk SSSR 227 (1976).
17. Levin, L. A. Soy. Math Doklady 17 (1976):522-526.
18. Penrose, R. The Emperor's New Mind. Oxford: Oxford University Press,
1989.
19. Rogers, H. Theory of Recursive Functions and Effective Computability. New
York: McGraw-Hill, 1967.
20. Shannon, C. E., and W. Weaver. The Mathematical Theory of Communica-
tion. Urbana: Univ. of Illinois Press, 1949.
21. Smoluchowski, M. In Vortgige iber die Kinetische Theorie der Materie and
der Elektrizitit. Leipzig: Teubner, 1914.
22. Solomonoff, R. J. Info. 6 Control 7 (1964):1-22.
23. Szilard, L. Z. Phys. 53 (1929):840-856.
24. Turing, A. M. Proc. Lond. Math. Soc. 42 (1936):230-265.
25. Zurek, W. H. Nature 341 (1989):119-124.
26. Zurek, W. H. Phys. Rev. A 40 (1989):4731-4751.
27. Zurek, W. H. In Proceedings of the International Symposium on Quantum
Mechanics, edited by Y. Murayama. Tokyo: Physical Society of Japan, 1990.
Carlton M. Caves
Center for Laser Studies, University of Southern California, Los Angeles, California
90089-1112
Entropy and Information:

How Much Information is Needed to Assign a
Probability?
How does one describe a physical system? By enumerating possible "states"

(classically, phase-space cells; quantum mechanically, Hilbert-space states), labeled
by an index j, and assigning probability WI ir) to find the system in state j. The
probability assignment p(ji ir) reflects what one knows about the system—not what
the system is "actually doing." In this explicitly Bayesian formulation,12,13'14 the
probability assignment p(j1 r) is based on—is defined relative to—one's prior in-
formation about the system, symbolized by r. The prior information includes the
enumeration of possible states and the information needed to choose probabilities
for those states.
The frequent use of "one" needs to be rubbed out, because it is both annoying
and anthropocentric. Henceforth, I replace "one" with the neutral term "memory,"
and I speak of a memory that stores the prior information, which it uses to generate
the probability assignment p(ji ir).
Complexity, Entropy, and the Physics of Information, SF1 Studies in the

92 Carlton M. Caves
Suppose the memory examines the system and discovers which state it is in. The
average information that the memory acquires is quantified by the Gibbs-Shannon
statistical information9' 22
H=— E r) loge p(j(r) bits , (1.1)
which is identical (within a multiplicative constant) to the entropy of the system.

Like the system probabilities, the statistical information or entropy is defined rel-
ative to the prior information. It goes from a maximum, log2(number of states),
when the system is equally likely to be in any of its possible states, to a minimum,
zero, when the memory knows which state the system is in (no more information to
obtain). Statistical information is a highly developed concept, used for a hundred
years as entropy in statistical physics1°,21 and for forty years in communication
theory.'
But what about the prior information? Shouldn't one be able to quantify it
as well—i.e., quantify the amount of information used to generate the probability
assignment p(ji r)? For this purpose statistical information is singularly unhelp-
ful. Who enumerates possible memory states, and who assigns them probabilities?
Should someone bother to do so, the memory cares not one whit. It assigns system
probabilities based on the prior information that it actually has. One wants to know
how much space, in bits, the prior information occupies in the memory.
The mathematical tool for this task—algorithmic information theory4,18,23—
has been developed over the last twenty-five years.8'8,7,18,19,20,29 To avoid includ-
ing irrelevant prior information, one asks for the minimum information that the
memory needs to specify the probability assignment p(j1 7). To make this notion
reasonably precise, one refers the question to a universal computer. The memory
stores a program which, when presented to the computer, causes it to embark on
a calculation that enumerates the possible system states and assigns them proba-
bilities. The minimum information, called the algorithmic information to generate
the probability assignment p(ji 7), is the length, in bits, of the minimal program
to do this job. It is the irreducible information content of the prior information.
Algorithmic information is defined up to an additive constant, which depends on
the choice of universal computer.
To reiterate: A memory, based on the prior information that it has about a
physical system, assigns probabilities to system states. The average information
that the memory acquires, should it discover the state of the system, is measured
by the Gibbs-Shannon statistical information (1.1). The information content of
the prior information is quantified by the algorithmic information to generate the
probability assignment. This paper explores connections between these two kinds
of information.
What is my perspective in this endeavor? First, I have been decisively influenced
by E. T. Jaynes,12,13,14 the leading exponent of the Bayesian view of probabilities,
which holds that a probability is a measure of credible belief, based on one's state
of knowledge. I take the Bayesian view explicitly as my starting point. Second,
Entropy and Information 93
my interest in algorithmic information was sparked by Charles Bennett's Scientific

American article2 on Maxwell's demons. This article led me to Bennett's reviews
of reversible computation, the last section of which makes the seminal connection
between entropy and algorithmic information. Finally, after arriving at this work-
shop, I discovered that much of what I had thought and dreamed about had been
anticipated and developed by Wojciech Zurek. Since the workshop, I have been pro-
foundly influenced by Zurek's two recent papers26,27 that, codifying and extending
earlier work, lay the foundations for further work on entropy and algorithmic infor-
mation. Indeed, this paper should be read in conjunction with Zurek's two papers.
My viewpoint, however, differs from Zurek's: he sees algorithmic information as
a system property—specifically, a property of system microstates—which supple-
ments or even replaces the usual statistical definition of entropy; I, starting from
a Bayesian orientation, see algorithmic information as a memory property, helping
us to understand how we describe physical systems.
With regard to notation, I use H for statistical information and I for algo-
rithmic information. Both are always given in bits. This leads to ubiquitous base-2
logarithms, which I denote henceforth by log, whereas natural logarithms are de-
noted by ln. Thermodynamic entropy in bits is denoted by S2 and in conventional
units by S; the two are related by S = S2kBln 2, where kB is the Boltzmann
constant.
STATISTICAL AND ALGORITHMIC INFORMATION:

PRELIMINARIES
The primitive notion of the amount of information is the "space," in bits, required
to store a set of data: a string of N 0's and l's constitutes N bits of information.
Since the number of strings is /V = 2', an immediate generalization identifies the
amount of information as log (number of configurations) = logN = N bits. No
probabilities yet.
STATISTICAL INFORMATION
Consider, however, a "system" that can occupy one of 3 states, labeled by an index
j, and let J be the set of values of j. Think of J as a property or quantity of the
system, which takes on the values j. A good example to keep in mind is a physical
system with a macroscopic number of degrees of freedom, f 1024 = 2"; the
states are system microstates—classically, phase-space cells; quantum mechanically,
perhaps energy eigenstates.
Suppose that a memory, based on prior information r, assigns probability
p(j17r) to find the system in state j. How much information is "stored" in the
system (or in the quantity J)? Evidently, the information stored is not a prop-
erty of the system alone; it must be defined relative to the information that the
94 Carlton M. Caves
memory already has. For example, if the memory assigns uniform probabilities
.7
p(jIw) = j-1, it regards the system as storing log bits (the amount of "space"
available in the system), but if the memory knows which state the system is in, it
regards the system as storing no information. Probabilities do play a role now.
Linguistic precision, therefore, requires dropping the system-specific term "in-
formation stored." The question should be rephrased as, "How much information
does the memory acquire when it finds the system in a particular state?" The an-
swer, on the average, is well known—it is the statistical information (1.1)—but a
brief review highlights what the answer means.
Consider, then, a Gibbs ensemble consisting of N systems, distributed among
states according to the probabilities p(j1 7). When N is very large, the only ensemble
configurations with nonvanishing probability are those for which the states j occur
with frequencies p(j) 7). Each such configuration has probability
Ar-1 = 11[1501 10]NPUI

(2.1)
and the total number of configurations is

N!
= (2.2)
[Np(jfw)]!
I
[Stirling's formula relates Eqs. (2.1) and (2.2) for large N]. Hence, the amount of
information that the memory acquires when it finds a particular ensemble configu-
ration is log.N. = NH(JI 7), where
H(.117r)si -Ep(jlir) log p(ji r) , (2.3)
the Gibbs-Shannon statistical inforrnation,9,22 can be interpreted as the average

information per system in the ensemble. In the notation introduced here, H(Ji 7) is
referenced, first, to the quantity J and, second, to the prior information it. The latter
reference keeps one honest: it is a constant reminder that statistical information is
defined relative to the prior information.
The preceding "derivation" displays the content, but has none of the rigor of
a proof of the Fixed-Length Coding Theorems of communication theory, where
the system becomes a .7-letter alphabet, the Gibbs ensemble becomes an N-letter
message drawn from the alphabet, and NH is the length of the binary code words—
strings of 0's and 1's—needed to code all messages in the limit N oo. Real mes-
sages being of finite length, there is great emphasis in coding theory on coding finite
messages or pieces of messages, whereas the Gibbs ensemble, being a construct, can
be arbitrarily large.
The amount of information that the memory acquires when it finds a particular
ensemble configuration is the length of the binary code words for the ensemble. The
statistical information H(.11 r) is the code-word length per system needed to code
an arbitrarily large Gibbs ensemble of systems. This way of thinking makes direct
contact with the primitive notion of "space" as the amount of information, but
it has disadvantages associated with the use of a Gibbs ensemble. Since the code
words are assigned to ensemble configurations, there is no way to consider individual
members of the ensemble. In particular, it is impossible to identify the information
that the memory acquires when it finds a member of the ensemble in a particular
state.
This difficulty can be circumvented by considering variable-length "instanta-
neous" coding.8 One assigns to each system state j a distinct binary code word,
whose length can vary from one state to another. A message in this code is a
sequence of code words, signifying a sequence of system states. The code words
make up an instantaneous or prefix-condition code if no code word is a prefix for
any other code word. (An instantaneous binary code can also be viewed profitably
as a dichotomic tree search 8,25) A message in an instantaneous code is uniquely
decodable with no requirement for end markers to separate successive code words.
Although instantaneous codes are not the only codes with this property, they are
special in the following sense: a message in an instantaneous code is uniquely de-
codable as soon as each code word is completed.
An immediate consequence of the convexity of the log function is that, no mat-
ter what the probability assignment p(ji r), the average length of an instantaneous
binary code is not less than the statistical information:8,25
H(J17)5_EP(iI r)ti • (2.4)
Equation (2.4) provides a strict lower bound on the average word length, but it gives
no hint how closely this lower bound can be approached. There is an explicit pro-
cedure, due to Huffman,8,11,25 for constructing an optimal code—one with smallest
average length. Huffman's procedure is an optimal realization of the idea that one
should assign long code words to states with low probability and short code words to
states with high probability. More useful here is a non-optimal coding procedure,8'25
which Zurek27 calls Shannon-Fano coding, for which an upper bound on average
length can be established. In Shannon-Fano coding, one assigns to state j a code
word whose length is the smallest integer greater than or equal to — log p(j17); thus
the length satisfies
— log p(j I r) 5 fa < — log p(j1 ir) + 1. (2.5)
That such a code can be constructed follows from a condition known as the Kraft
inequality8.25: there exists an instantaneous binary code for a particular set of code-
word lengths if and only if those lengths satisfy E„ < 1. Shannon-Fano coding
satisfies the Kraft inequality as a consequence of the left inequality in Eq. (2.5).
96 Carlton M. Caves
Averaging the right inequality in Eq. (2.5), one finds that Shannon-Fano codes obey
the inequality
p(j1 7)4 < H(Ji 7) + 1 . (2.6)
Optimal codes, which cannot have greater average code-word length, satisfy the
same inequality, although the word lengths of an optimal code do not necessarily
satisfy Eq. (2.5).
Combining Eqs. (2.4) and (2.6) yields upper and lower bounds that relate av-
erage code-word length for optimal and Shannon-Fano codes to statistical informa-
tion:
H (JI r) 5_ E p(iir)ti < H(J1r) + 1. (2.7)
All instantaneous codes obey the lower bound, whereas the upper bound is a ques-
tion of existence: there exist codes, such as Huffman and Shannon-Fano codes,
whose average length lies within the upper bound. One may interpret the code-word
length ti as the amount of information that the memory acquires when it finds the
system in state j; given the bounds (2.7), the average information acquired by the
memory is close to the statistical information. Indeed, for a macroscopic physi-
cal system near thermodynamic equilibrium, the statistical information is roughly
H f 280, so the statistical information and the optimal average code-word
length are essentially identical.
This interpretation, though appealing, is not wholly satisfactory. Why such
emphasis on instantaneous codes? Why should the length £ be the relevant amount
of information, when the code word represents the state j only in the sense of
a somewhat arbitrary look-up table? Where is the necessary dependence of the
amount of information on the prior information? These questions can be answered
by framing the discussion in terms of algorithmic information theory, which deals
with the length of programs on a universal computer. Because the programs are
required to be "self-delimiting," they constitute an instantaneous code. The code-
word length ti becomes the length of a minimal program which, when added to
a minimal program for generating the probability assignment p( ji 7), causes the
universal computer to produce a complete description of the state j. This minimal
program length is the irreducible information content of the state j, relative to the
prior information, and has every right to be called the information that the memory
acquires when it finds the system in state j.
Before turning to algorithmic information, however, it is useful to list the prop-
erties of statistical information for two (or more) quantities.8 For that purpose,
consider two quantities, J and K, which take on values j and k. To describe the as-
signment of probabilities in this situation, one says that a memory, based on prior in-
formation r, assigns joint probability p(j,ki r) = p(jJ r)p(k1 j, 7) = p(kj ir)p(jj k, 7)
to find values j and k. One defines the conditional statistical information,
H(J1k, 7r) - E p(ji k, 7) log p( jj k,7), (2.8)

and the following kinds of information: the joint statistical information,
H(J, KI r) = - E p(j,k1v) log p(j, , (2.9)

j,k
the average conditional statistical information,

H(JI K F--Ep(klir)H (JIk or) = — EP(i,k1 ir) log j I k,r), (2.10)
k j,k
and the mutual statistical information,

H(K; JI 11) E H(J1r) — H(JIK,r)
= E10110 Ep(il k,r) log (PUI 2") ) > 0

P(j1 7r) *
(2.11)
Relations among these last three kinds of information are summarized by

H (J1 2-)+ H (.11 J, 7) = H (J , Kir) = H (K , J17) = H(Klir)+ H (J1 K , (2.12)
H(K; .111.) = H (J1r)— H (JI K, 7r) = H (K I H (K1 J, a) = H(J; Kir) . (2.13)
The mutual information H(K; J)lt) = H(J; Kir) quantifies the information carried
in common (mutually) by J and K; it is the average information that the memory
acquires about K (J) when it finds a value for J (K).
ALGORITHMIC INFORMATION
Algorithmic information theOry4'5'6'7'15'16'19'2°'23'" has been developed over the
last twenty-five years, in large part to make rigorous the notion of a random num-
ber. Here I give only the barest summary of the principal ideas. More extensive
summaries, aimed at the physics applications pursued here, can be found in the
two recent papers by Zurek 26,27
Algorithmic information theory deals with a universal computer—for example,
a universal Turing machine—which computes binary strings—sequences of 0's and
1's—and n-tuples of binary strings. I use the letters p, q, and r to denote binary
strings, and I let lql be the length of the string q. Suppose one settles on a particular
universal computer. A program p for this computer is a string such that, when
the computer is presented with p, it embarks on a computation that halts after
producing an output string. The program must halt, and as there is no provision
for program end markers, p must carry within itself information about when to
halt. Such programs, called self-delimiting, constitute an instantaneous code: no
program can be a prefix for any other program.
The absolute algorithmic information I(q) of a string q is the length of the
minimal program q*—program of shortest length—that produces q as its output:
I(q) E 1, q* the minimal program for q . (2.14)
98 Carlton M. Caves
Choosing a different universal computer can change the length of the minimal pro-
gram by an additive constant that depends on the choice of computer, but not on
the length of q. Thus, to within this constant, algorithmic information is precisely
defined and quantifies the irreducible information content of the string q. Reflect-
ing this computer dependence, equalities and inequalities in algorithmic information
theory are proven to within 0(1)—i.e., to within the computer-dependent additive
constant. Following Zurek,26 I use "physicist's notation"— , > —to denote
approximate equality and inequality to within 0(1).
Some strings are algorithmically simple; such a string can be generated by a
program much shorter than its own length. Most strings, however, are algorithmi-
cally random in the sense that the simplest way to generate them is to list the entire
string. Indeed, the absolute algorithmic information of a typical (random) string is
I(q) log kJ . (2.15)
The leading term Iql is the number of bits needed to list the entire string; the
logarithmic term log IqI is the number of bits needed to specify the length IqI of the
string, information the program needs in order to be self-delimiting.
Extension of algorithmic information to n-tuples of strings is straightforward.
For example, the joint algorithmic information I(q, r) is the length of the minimal
program to generate string q followed by string r, the two strings separated by
some punctuation such as a comma. Though straightforward, the extension to n-
tuples reveals the importance of the restriction to self-delimiting programs. With
self-delimiting programs, it is easy to convince oneself that I(q, r) < I(q) + I (r),
because minimal programs for q and r can be concatenated.5,6,19,20 In contrast, if
the programs are not self-delimiting, the concatenated program needs to contain
information about where one program ends and the next begins; as a consequence,
the inequality holds only if one adds logarithmic corrections of order log I(q) +
log I(r) to the right side.6,6
The generalization to n-tuples allows one to define conditional algorithmic in-
formation. Suppose the computer is given the minimal program e for r as an
"oracle." One may then consider programs which, when added to r*, cause the
computer to calculate an output string; notice that these conditional programs,
being self-delimiting, form an instantaneous code. Now let qr,.. be the minimal pro-
gram that must be added to r* so that the computer produces q as its output. The
conditional algorithmic information 1(q1r) is the length of
I(qj r) I, qr,.. the minimal program for q, given r* as an oracle . (2.16)
It is crucial in this definition of /(qI r) that the computer be given the minimal
program r*, rather than r, as an oracle [equivalently, it could be given r and I(r)].5
With this definition, it is possible to show that NI r) <I (q); if the computer is
given only r instead, this inequality holds only if one adds logarithmic corrections
of order log I(r) to the right side.5 One can now define the mutual algorithmic
information
I(r;q) = I(q) — I (q1 r) > 0, (2.17)
which quantifies the extent to which knowledge of I.* allows the computer to use a
shorter program to compute q.
Relations among the various kinds of algorithmic information can be summa-
rized by
I(q) I(r1 q) = I(q,r) N I (r, q) = I(r) I(qlr) , (2.18)
I (r; q) = I (q) — I (q1r) = I (r) — gri q) = (q; r) . (2.19)
Aside from the 0(1) equalities, these relations are identical to those for statis-
tical information [Eqs. (2.12) and (2.13)], yet algorithmic information deals with
individual strings whereas statistical information is couched in terms of averages.
STATISTICAL AND ALGORITHMIC INFORMATION:

INEQUALITIES
Having surveyed algorithmic information theory, I turn to inequalities that relate
statistical and algorithmic information, dealing first with a single quantity J and
then with a one-stage "tree search"—i.e., with two quantities, J and K, where the
values of K correspond to the initial branches of a tree, which then branch to the
states j.
INEQUALITIES FOR A SINGLE QUANTITY
Return to the system introduced above and its states j. Since algorithmic informa-
tion is defined in terms of binary strings, associate with each state j a specifying
string ri , which completely describes the state. For a classical physical system, ri is
a list of phase-space coordinates (to some accuracy), which specifies a phase-space
cell; for a quantum-mechanical system, ri might be an energy eigenvalue that spec-
ifies an energy eigenstate. The absolute algorithmic information I(ri) of state j is
the length of the minimal program 71 to generate the specifying string ri .
The memory uses prior information w to enumerate system states and assign
them probabilities. To include this prior information within the language of algo-
rithmic information theory, imagine that the memory stores a program which, when
presented to the universal computer, causes the computer to list the states and their
probabilities. This sort of program has been considered by Zurek.27 To formalize
it, let p.r denote an n-tuple of strings that consists of the specifying strings r5 for
all states with non-zero probability and, associated with each string, its probability
p(jI 7). Since I want to use the full apparatus of algorithmic information theory—in
particular, to have self-delimiting programs for computing pji,r—I insist that the
100 Carlton M. Caves
number J of system states (with non-zero probability) be finite (unlike Zurek) and
that the probabilities be given to some finite accuracy. For a physical system in
thermodynamic equilibrium, this requirement can be accommodated by using the
microcanonical ensemble or by using the canonical ensemble with an algorithmically
simple maximum energy. The irreducible information content of the prior informa-
tion r is I(pJIx) = IpiIA I, the length of the shortest program p*jix to compute pji,
I call /(pii,r) the algorithmic prior information.
Two fundamental inequalities place upper and lower bounds on the average
absolute algorithmic information:
Ho 7) 5 Ep(ii 2.)/(ri)$., H(.7 ir) + Apil.) • (3.1)
Bennett' pointed out the relevance of these bounds to the entropy of physical sys-
tems, and Zurek27 has given an extensive discussion of their importance in that
context. That the computer programs form an instantaneous code leads immedi-
ately to the left inequality in Eq. (3.1). The right inequality is related to inequal-
ity (2.6) for Shannon-Fano codes. I give here a precis of Zurek's proof 27 of the right
inequality, which in turn is based on the proof of Zvonkin and Levin.29
The minimal program pj̀p, alone generates the specifying strings ri and their
probabilities p(ji 7r). Suppose one adds to first, a sorting algorithm that sorts
the strings rj in order of decreasing probability (or in some other well-defined order
for equal probabilities), and, second, a coding algorithm that assigns to each string
rj a Shannon-Fano code word, whose length satisfies < r) + 1. The
sorting and coding algorithms are algorithmically simple; their overall length I„
can be included in the computer-dependent additive constant. One can now obtain
a program to generate a particular specifying string ri by adding to the above
program the code word for rj. Since the code word is part of an instantaneous code,
there is no need to supply the length of the code word. The result is a program for
computing rj whose length [to within 0(1)] is given by /(pj12.)-F isc /(p.717) —
log p(jiw), where the latter inequality follows from including I„ and the 1 from £3
in the computer-dependent additive constant. The minimal program for generating
rj being no longer than this one, one arrives at an inequality
—logp(jlw) gpji,r ) , (3.2)
whose average over p(jI ir) yields the right inequality in Eq. (3.1).
Important though Eq. (3.1) is, it is unsatisfactory because it is couched in
terms of absolute algorithmic information. More useful are inequalities involving the
conditional algorithmic information I(r3 I pjk.), which is the length of the minimal
program for ri when the computer is given R.71,,, as an oracle. If the memory stores
Ari I NO is the length of the additional program that the memory must store
to generate rj; thus I(ri I pik.) may be interpreted as the amount of information that
the memory acquires when it finds the system in state j, given the prior information
7. To define /(rj I pii,r ) meaningfully, there must be a self-delimiting program for

p ji,r ; this accounts for my earlier insistence on a finite number of system states.
Because the programs for which pj*fr is an oracle form an instantaneous code,
there is an immediate lower bound on the average conditional algorithmic informa-
tion,
HP I 7) 5 E p(ji 7)/(ri 1 p.71„) . (3.3)
i
An upper bound follows from re-interpreting the argument leading to Eq. (3.2) as
giving a program for generating both pjix and a particular specifying string rj.
Thus the same argument shows that the joint algorithmic information satisfies
/(p ji,,., rj) ,..5., — log p(j1r) + /(pjl,r) . (3.4)
Writing the left side of this inequality as /(pj],r, rj) = I(ri)+ l(pjklrj) yields a
new upper bound for I(ri) in terms of mutual information,
1(r1) -5, — logp(j1 7) ± gni; pJO. (3.5)
Averaging this inequality over p(j11.) leads to an upper bound tighter than the one
in Eq. (3.1),
E PU1 7)1(ri) ••• HO 7) + E Ail ir)/(rJ;13,1,) (3.6)
i i
[tighter because gra; pii,) ,., /(pjl,r)]. If, instead, one writes the left side of Eq. (3.4)
as /(pj17, ri) = /(P/pr) + I(r) I pjfr), one obtains the inequality
I(ri I pit,,.) ..S. — log p(jj r)• (3.7)
Averaging this inequality over p(j1 7) and combining the resulting upper bound
with the lower bound (3.3) yields the desired double inequality for the average
conditional algorithmic information,
11(.11 7) .. Exii 7)/(r.i I Rio .... H(Ji 7) • (3.8)

i
The double inequality (3.8) is tight. It means that the average conditional algo-
rithmic information—or, what is the same, the average information that the mem-
ory acquires when it determines a system state—lies within a computer-dependent
additive constant above the statistical information H(JI 7). This justifies the stan-
dard interpretation attached to H(JI r). Indeed, one may write the 0(1) equality
H(Ji 7) .=- E poi 7)/(ril p.n.) , (3.9)

I
with the understanding that H(JI 7r) provides a strict lower bound on the average
conditional algorithmic information. Equation (3.9) translates to
p(j12- )I(ri) H(.717) + Epuliogri;p.,1„), (3.10)

‘
7d
which shows that the upper bound (3.6) is actually an 0(1) equality.
In contrast, the double inequality (3.1) for the average absolute algorithmic
information can be very loose. For algorithmically simple probabilities—i.e., prob-
abilities such that I(pj 1,,.) < H(.1)70—the double inequality (3.1) is tight; Zurek27
calls such probabilities concisely describable. For many probabilities, however, the
double inequality (3.1) is essentially meaningless, because /(p,ri,) is not only much
larger than H(J) w), it is also much larger than the maximum values of /(rj).
To illustrate these points, it is useful to estimate the various kinds of algorithmic
information in several examples. Suppose, first, that J consists of the first 3 natural
numbers (j = 0,1, , 9 —1), where 3 (log >> 1) is itself a typical (random)
number. Let the specifying string ri be the binary representation of j, which is
typically log 3 bits long. To specify a typical number j, one must give the length of
rj, which requires log log 3 bits, and then give the entire string ri , which requires
log 9 bits. Hence, the absolute algorithmic information of a typical number j is
I (ri) log + log log J (3.11)
There are, of course, algorithmically simple numbers with much smaller algorithmic
information, but they are few compared to the typical (random) numbers.
Suppose now that the memory assigns uniform probabilities p(j1n) = 3-1;
the statistical information has its maximum value H(J1w) = log J. The essential
information needed to enumerate J and assign uniform probabilities is the number
3, so the algorithmic prior information is
/4)40 log ,7 + log log (3.12)
It takes about as much information to generate the probability assignment as to

specify a typical string. These uniform probabilities, though apparently quite sim-
ple, are not concisely describable, and the double inequality (3.1) is very loose.
With the probabilities in hand, however, the further information needed to specify
a typical number is just the string ri, so the conditional algorithmic information is
l(ri I pji,c) log J , (3.13)
an estimate that accords with the double inequality (3.8). Eqs. (3.11) and (3.13)
combine to give an estimate for the mutual algorithmic information,
l(ri;p log log J , (3.14)

which is the length of a typical string rj—an interpretation that makes sense of a
final estimate
/(pji„, r5) = /(pjl,r ) — l(ri;pji,) 2..• log 3 . (3.15)
The estimate for algorithmic prior information in this first example is driven by
the assumption that 3 is a typical (random) number. As a second example, let J be
the set of binary strings of length N > 1; then 9 = 2N is algorithmically simple.
Suppose again that the memory assigns uniform probabilities, so that H (JI 7r) =
N = log Y. Proceeding as in the first example, one can estimate the various kinds
of algorithmic information for typical strings rj:
.1(rj)?_-. N + log N, /(pji) log N

/(rj; pjf,r) log N . (3.16)
grjlp./17) N, /(pii,I rj) — 0
The key difference lies in the estimate for the algorithmic prior information ./(1:.
as in the first example, the essential information needed to enumerate J and assign
uniform probabilities is the number j = 2N, but here 3 can be specified by giving
the number N, which requires only log N bits (potential terms of order log log N are
ignored). In this example, it takes much less information to generate the probability
assignment than to specify a typical string. The uniform probabilities are concisely
describable, and the double inequality (3.1) is tight.
Suppose that, instead of uniform probabilities, the memory assigns probabilities
of ji corresponding to H(J1r) = 1, to each of two strings, qo and ql. For a case of
algorithmically simple strings—let qo be the string of N 0's and q1 be the string of
N 1's—the estimates for algorithmic information become
gq„) = 1+ log N, /(pjfr) log N

/(qa ; log N , (3.17)
I(q„lpji,)r.t. 1, /(R,TIT I qa) 0
where a can be 0 or 1. In contrast, for two typical strings that are algorithmically
independent—i.e., I(qo; qi) t...• 0—the estimates become
I(qa) =-• N + log N, .1(pji,) ct., 2N + log NI

.1(qa;pilT) N + log N .
1(qct ipji,) 1, Apjor lqa) N
(3.18)
This latter case generalizes easily to M < N algorithmically independent, typical
strings 9a, a =1,...,M, assigned probabilities M-1., corresponding to H(JI =
log M; the estimates become
N + log N, MN + log N
1(qc,; N + log N ,
1(q.I pii0 ce. log M, I(pji,r 1qa) tse (M — 1)N
(3.19)
where a = 1, , M. In these last three cases, I include only leading-order contri-
butions that can be reliably estimated. In the latter two cases, it takes more—in
the last case, much more—information to assign probabilities than to specify any
single string; as a consequence, the double inequality (3.1) becomes meaningless.
These examples illustrate an important additional point: probabilities that cor-

respond to the same statistical information can require wildly different amounts of
algorithmic prior information. This means that there is no information balance in
the process of assigning an initial probability to system states--i.e., no trade-off be-
tween the amount of prior information, /(pm.), and the reduction, log J - H (J a),
in statistical information below its maximum value. The double inequality (3.8) sug-
gests, however, that after assignment of an initial probability, there is a trade-off,
on the average, between information that the memory acquires and reductions in
statistical information.
A PHYSICAL EXAMPLE: THE MICROCANONICAL ENSEMBLE

Before exploring this trade-off, it is instructive to apply algorithmic information
concepts to a physical example: the microcanonical ensemble for a system with a
macroscopic number of degrees of freedom, f 28°. Zurek27 has considered in detail
the application of algorithmic information to another example, a classical ideal gas.
It is convenient to take a quantum-mechanical point of view; classical con-
siderations lead to essentially the same conclusions. How does one describe the
quantum-mechanical system? First of all, one needs the Hamiltonian and associ-
ated boundary conditions. These are algorithmically simple, else one couldn't claim
to have a description at all. Let .10 denote their algorithmic information content—
i.e., the length of the minimal program to generate the Hamiltonian and describe
the boundary conditions. Implicit in the Hamiltonian and boundary conditions are
the energy eigenstates—the microstates j for the system—which are specified by
their eigenvalues
Introduce now a smoothed density of microstates, p(E). Its inverse, 1/p(E),
is the mean spacing between energy eigenstates near energy E and is the natural
unit for energy E. Let E E-: Ep(E) be energy in this natural unit. Choose the
specifying string rj for microstate j to be the binary representation of the energy
eigenvalue E1 = Eip(Ei), given to enough digits to distinguish neighboring eigen-
values. Typically this means that the length of rj is log Ej + a few bits. Of course,
some eigenstates lie much closer together than the mean spacing, and their speci-
fying strings require more digits. One should choose a maximum number of digits
beyond which neighboring eigenstates are considered degenerate and have to be
distinguished by an additional label. Because log Ej is so large for a macroscopic
system, however, I ignore this point and assume that the typical length of r1 is
log E1.
The microcanonical ensemblel°,21 describes a system "macrostate"; it arises
when one knows the system's energy E to some resolution bE and assigns equal
probabilities to all states with energy in the range E± 1•5E. Any practical memory
stores the system energy to n << log E binary digits, corresponding to energy res-
olution SE = 2-n+1E. The resulting number of "accessible" states, J = p(E)SE,
corresponds to statistical information
H(J17)= log J = log(p(E)SE) = S2(E)—n+ 1, (3.20)
where
S2(E) = log(p(E)E) =log (3.21)
is the microcanonical thermodynamic entropy in bits. For a macroscopic system
the entropy has typical size S2(E) f 2"; notice that the length of a typical
specifying string is S2(E1) bits. The statistical information H(J1r) is often called
the entropy10,21; for any remotely reasonable value of n, of course, it doesn't matter
whether S2(E) or HMI) is called entropy. Indeed, the success of statistical physics
depends on this indifference. The motivation for calling S2(E) the thermodynamic
entropy is that it doesn't depend on the subjective resolution SE.
To specify a typical microstate j, one gives the length of the specifying string
ri , which requires log log ti bits, and then gives the entire string rj, which requires
log E5 bits. Hence, the absolute algorithmic information of a typical microstate j is
1(10= log E1 + log log Ej = Sz(Ej ) log S2(E5 ) . (3.22)
The Hamiltonian and boundary conditions, which are algorithmically simple, can be
used to calculate the eigenvalues, but one should not conclude, as a consequence,
that the eigenvalues are algorithmically simple. Suppose a universal computer is
handed the Hamiltonian and begins calculating energy eigenvalues from the ground
state up. If the computer is to generate a particular eigenvalue Eh it needs suffi-
cient instructions to recognize that eigenvalue and halt. The algorithmic information
content of these instructions is typically log(number of eigenstates with eigenvalue
smaller than E1).--.1og(p(Ej)Ei) = S2(Ej), in agreement with Eq. (3.22). Zurek27
uses this argument to show how algorithmic information increases during the tem-
poral evolution of a classical ergodic Hamiltonian system.
The simplicity of the Hamiltonian and boundary conditions manifests itself in
generating simple classes of eigenvalues. Indeed, to assign the microcanonical prob-
abilities, the memory must store the Hamiltonian and boundary conditions (4 bits)
and the energy to n significant binary digits (n bits); in addition, it must know the
number of microstates to be included, J = p(E)SE =2H(ilf), which requires only
log H(J) r) bits, because the number of microstates is algorithmically simple. These
last log H(JI r) bits can also be viewed as specifying the energy range in natural
units or as giving the number of zeroes that follow the n significant digits in the bi-
nary representation of E. With this information a universal computer can calculate
the energy eigenvalues within the energy range 6E and assign them uniform prob-
abilities; i.e., it can calculate the n-tuple pjl, for the microcanonical probabilities.
Thus the algorithmic prior information for the microcanonical ensemble is
/o + n + log H(JI r) = Io + n + log S2(E). (3.23)
Each significant binary digit in E, beyond the first few, corresponds to a bit of
reduction in the statistical information (3.20). The term log S2(E) in Eq. (3.23)
has the appealing interpretation that it is the number of bits needed to say what
the entropy is.
The microcanonical probabilities are concisely describable—i.e., ./(p jiir ) <<
HM 7r). Indeed, any probability assignment for a macroscopic system must have
algorithmic prior information much smaller than the algorithmic information of a
typical microstate; otherwise, no practical memory could store the required algo-
rithmic prior information.
The conditional algorithmic information of a typical microstate j can be esti-
mated to be
i(ri I p.710 r:elog(p(E)5E) = H(J17) = S2(E) - n, (3.24)
an estimate that accords with the double inequality (3.8) and that leads to further
estimates,
n + log S2(E), 1(pjor lri) Io . (3.25)
INEQUALITIES FOR A TREE SEARCH

Return now to the question of a trade-off, on the average, between information that
the memory acquires and reductions in statistical information. For that purpose,
introduce an additional quantity K, which takes on values k = 1, . , K. In the
general case, the memory, based on prior information 7, assigns a joint probability
p(j, kj7) = p(j1r)p(k1 j, 7) = p(k)7)p(j1k , 7) to find the system in state j and to
find value k for K. Here I am interested in a special case: a one-stage tree search.25
Each state j lies at the end of a terminal branch of the tree; the values k label the
tree's initial branches, which come directly from the trunk and which branch out to
the terminal branches. Should the memory find value k for K, it then knows that
the system is in one of the states (in a set Jk) that extend from branch k; I say
that the system "occupies" branch k. Although the general case is not difficult, a
tree search is sufficient here, because it applies to the study below of "Maxwell's
demons."
These simple notions can be formalized by introducing a map g: J K, de-
fined so that state j lies at the end of the terminal branch extending from initial
branch g(j). The inverse defines the sets Jk = 9-1(k) = {j1g(j) = k}; let Jk be
the number of states in J. Knowing the state j tells one the initial branch k, so
the probabilities satisfy
P(kli, ir) = okaw, (3.26)
which means that
/olio = E = L pui , (3.27)
Akli,(ki
70P
P( ilk,w)= p lr) UI = bk,9(l) pP(
(3k111.)) •
(3.28)
The conditional probability (3.26) also implies three equivalent conditions on sta-
tistical information,
H(KI J, 7) = 0 .4=). H(K; JI 7) = H(K I 7) .. H(J, Kir) = H(470 , (3.29)
any of which could be taken as the definition of a tree search. If the memory
finds that the system occupies branch k, the reduction in statistical information is
H(.117)— H(Jjk,r), which has average value
E pcki ir)[H(JI ir) — H(Jik,7)] = 11(K; Ji v) = H(KIT) • (3.30)
The second equality is a consequence of assuming a tree search.

The memory makes the probability assignment p(j, kir) based on prior informa-
tion 7. For a tree search, the joint probability can be described in a very simple way,
suited to algorithmic information theory. Introduce an n-tuple of strings, p..rjcir
that consists of the specifying strings r1 , organized into the sets Jk; associated with
each specifying string is its probability p(j) r) and with each set Jk, its probability
p(kI 7) (both to some finite accuracy). [The probabilities p(k17) are redundant, but
there is no harm in including them.] This n-tuple is sufficient to construct the joint
probability p(j, kl 7), so it is fair to say that the algorithmic prior information is
/(Ruci, ). The algorithmic prior information is different than when considering J
alone, since it now includes information to group the states into sets J.
The goal now is to relate the average reduction (3.30) in statistical informa-
tion to algorithmic information. I consider two approaches, which lead to the same
conclusion. The first seeks for H(Klir) an analogue of the double inequality (3.8)
involving H(JI r). In seeking the analogue, it is instructive to consider the average
E, p(jI 7)/(ri i p jfr) in Eq. (3.8) as an average over the information that the mem-
ory needs, beyond the prior information, to describe the state that lies at the end of
a terminal branch. The analogous average involving H(Kly) is an average over the
information that the memory needs, beyond the prior information, to describe occu-
pation of branch k—i.e., to give the states in Jk and their conditional probabilities
k,7). To put this further information in the language of algorithmic informa-
tion theory, let pjlkor be an n-tuple of strings that consists of the specifying strings
ri for the states in 4, each associated with its conditional probability p(j1 k, 7) (to
some finite accuracy). Notice that pjikor plays the same role for Jk as pji,. does for
all the states with non-zero probability. The information that the memory needs,
beyond the prior information, to describe branch k is Apjtkor ip.7,1(0, which may
thus be thought of as the information that the memory acquires when it discovers
the system in branch k.
The minimal programs for the n-tuples pjikor, when p*J'xi, is given as an oracle,
constitute an instantaneous code for the values k, so there is a strict lower bound
H(Ki7r) p(khr)/(p.lik,T1 p,,KIT) • (3.31)

k
The argument that leads to Eq. (3.7), modified so that it starts with the minimal
program for pj,KI,„ and so that the sorting and coding algorithms apply to the sets
4, yields the inequality
/(13.rikor — logp(ki ir) . (3.32)
Combining the resulting averaged upper bound with the lower bound (3.31), one
obtains the desired double inequality involving H(KI
H(KI ir) < EP(ki H(K I it) , (3.33)
which implies that H(K I ir) and Ek p(k17r)/(pjp,„Ip jci,r ) are equal to within
0(1). The double inequality (3.33) establishes the information balance alluded to
earlier: when the memory discovers that the system occupies branch k, the average
reduction in statistical information is essentially equal to the average algorithmic in-
formation that the memory acquires. The lower bound in Eq. (3.33) applies directly
to an analysis of Maxwell's demons.
The first approach works out from the trunk of the tree to the initial branches.
The second approach starts at the terminal branches and works back to the ini-
tial branches. First rewrite the double inequality (3.8) using the prior information
Ai, that is now appropriate:
H(JI E poi 7)/(rJip.i,KI,r)...s. Hui . (3.34)
Next find a double inequality that involves the conditional statistical information
H(JI k, it). The exact analogue of Eq. (3.8) involves an average of conditional al-
gorithmic information I(rjfp_7 10,) over p(ji k, ir), but the same double inequality
holds if one also includes the prior information p*.r̀,Kb, in the information given the
oracle. The result is a double inequality
H(JI k, 7) 5 Ep(jI k,r)1(rj11:0 1,Kix )1).71k,$), H(JI k, it) . (3.35)

i
Treating the double inequalities (3.34) and (3.35) as 0(1) equalities, one finds, after
a bit of manipulation, the 0(1) equality
H(K I = p(k17)[H(J1 r) — H (J I k, ir)]
E P(iI 7 )gri P ilsWor I PJ,KIT) • (3.36)
Recalling that H(KI 7) = H(J; K17) leads to an attractive interpretation: the mu-
tual statistical information between J and K is essentially the same as the average
mutual algorithmic information (given the prior information) between state j and
the branch g(j) that branches to j.
That the second approach gives the same 0(1) equality as the first follows from
noting that /(piigcoor l pj „for , ri) = 0, since in the presence of the prior informa-
tion p';"., specification of a state j provides enough information to describe the
initial branch g(j) that branches to j. As a consequence, the mutual algorithmic
information satisfies the 0(1) equality
I(ri;Pils(i)oriPJ,KIT) 1(13.119(.),w PJ,K1T (3.37)

which converts Eq. (3.36) to the 0(1) equality that follows from Eq. (3.33). The
disadvantage in using the second approach is that one misses the strict lower bound
in Eq. (3.33).
MAXWELL'S DEMONS AND SZILARD'S ENGINES

A "Maxwell demon" is an entity that gathers and uses information in order to
turn heat into work. Whether and in what sense such demons fall within the con-
fines of the Second Law of Thermodynamics has long been a question of debate.
Present understanding of algorithmic information theory and the thermodynamics
of computation allows one to address this question in detail: the demon does not
violate the Second Law because of work it must do to erase the information it
gathers.1,2,26,27,28
A good example to keep in mind is an N-box Szilard engine.1'2'24 Each box con-
tains a single molecule, is capped at left and right ends by pistons, and is equipped
with a movable partition that, when dropped, divides the box into equal left and
right volumes. The molecules are maintained at temperature T by contact with
the walls of the boxes, which can be regarded as a heat reservoir. A cycle of the
engine goes as follows. The partitions, initially raised so that each molecule is free
to explore its entire box, are dropped, and a memory determines on which side (left
or right) each molecule is trapped. Using this information, the memory inserts the
piston on the empty side of each box, raises the partitions, and allows each molecule
to do isothermal work kBT1n 2 as it pushes its piston back to the original position.
With the cycle apparently completed, the memory has extracted work NkBT in 2
from the heat reservoir.
Bennett1,2 has identified additional work that must be done to complete the
cycle. The memory must store a record of the molecules' positions, which could
be a string of N 0's and l's (0 for a molecule on the left; 1 for a molecule on the
right). To complete the cycle, this N-bit string must be erased and the memory
returned to some standard state, ready to record data for the next cycle. Bennett
appeals to Landauer's seminal idea337'18 that to erase a bit into an environment at
temperature T requires work > kBT In 2 and concludes that to erase the memory
requires at least as much work as was extracted from the heat reservoir.
Zurek27 takes this analysis a step further by allowing the memory to be a "de-
mon," which can replace its record by an algorithmically simpler one. For example,
should the demon-memory find all the molecules on the left—admittedly a rare
occurrence—it could store the record in the much shorter form of a minimal pro-
gram for generating a string of N 0's, and it could complete the cycle by erasing
far fewer than N bits. Although it is generally not possible to find the minimal
program for a given string or prove that a program that works is the minimal one,
it is interesting, nonetheless, to inquire whether a demon-memory that could find
or guess minimal programs could also violate the Second Law.
To address this question, it is useful to abstract the description of the N-box
Szilard engine to a more general engine. In doing so, there are two equivalent meth-
ods of analysis. The first method identifies a subsystem that does isothermal work
as it extracts heat from a heat reservoir. This subsystem is described by a canonical
ensemble with temperature determined by the reservoir temperature. In the case of
the Szilard engine, this subsystem consists of the N molecules. The second method
regards the subsystem and the reservoir as a single isolated system, described by
the microcanonical ensemble. The information-gathering (compression) phase of the
cycle restricts this system to a part of its "phase space," from which it "expands"
as it does adiabatic work during the expansion phase of the cycle.
The second method is simpler and better suited to the previous discussion of a
system with a finite number of accessible microstates. Thus consider again a system
with microstates j. This system begins its cycle in an initial macrostate, which has
energy E(s) and thermodynamic entropy
S(' ) = S(E(0) = S2(ki))kB In 2 = H(.7170kB In 2, (4.1)
where H(Jhr) = log .7 is the statistical information. Here j is the number of

accessible microstates, which have equal probability p(ji ir) = j-1. It is useful to
allow for two temperatures; let T1 be the temperature of the system, given by10,21
1 _ dS
(4.2)
T1 dE lE=B0).
During the information-gathering phase, the memory observes and records

which of K sub-macrostates the system occupies [log K << H(1170]. These sub-
macrostates, labeled by k, correspond to the initial branches of the tree consid-
ered above: sub-macrostate k consists of the „Ik microstates in the set 4, each
of which has conditional probability p(j1k, = 9k-1. For the Szilard engine, the
sub-macrostates correspond to the K = 2N possible configurations of the molecules
after the partitions are dropped. In sub-macrostate k the system has unchanged
energy Ek = E(s), but reduced thermodynamic entropy,
Sk = H k, v)kB In 2 , (4.3)
where H(J)k,w) = log Jk is the conditional statistical information.

To extract work during the expansion phase, the memory uses a device like
the pistons in the Szilard engine. The memory must use its record to configure this
device so that it matches the observed sub-macrostate k—i.e., so that it "fits" the
configuration that some subsystem has when the system is in sub-macrostate k. For
the Szilard engine, this means inserting the pistons into the empty sides of the boxes.
The subsystem does isothermal work as it "expands" to its original configuration;
equivalently, the entire system does adiabatic work 1471+). The entire system is left
in a final macrostate, which has reduced energy,
E( fk ) = Ek — 147-1-) = E(i) , (4.4)
but unchanged thermodynamic entropy,
5(1k ) = S(E( fk )) = Sk = H k, s-)kB In 2. (4.5)
The final energy E(fk) and final entropy S(fk) can depend on the observed sub-
macrostate k (they don't for the simple Szilard's engine, but they would if the
Szilard engine were modified so that the the partitions divided the boxes into un-
equal volumes).
The work extracted can be related to the change in entropy,
S(i) — SUk)
W,P.) = E( 1) — E(fk) =(dS/dE)E=E0)
, (4.6)
—71(50) — scro)
and then written in equivalent forms,
WI+)
= H(J)7) — H(Jik,r) = — log p(k 7) . (4.7)
In 2
The rightmost equality, though not true for a general one-stage tree search, holds
here because the probability to find the system in sub-macrostate k is p(kl ir) =
Jkb.7.
The average work extracted during the cycle is
WTI+) = Ep(k1 7r)147P") = H (K.I 70k B71 In 2 . (4.8)
A conventional thermodynamic analysis views the memory from the outside, assigns
probability p(kI 7) to its various records k, and concludes that its entropy increases
during a cycle from zero to
Sine. = H (K I ir)kB In 2 = Ep(k1 20[5(i) — Pk)] (4.9)
In this conventional "outside view," the memory is a zero-temperature heat reser-

voir, which takes up the entropy extracted from the system without having to absorb
any energy. According to conventional thermodynamics—no perfect refrigerator!—

to exhaust the memory entropy into a heat reservoir at temperature T2 requires
work
—(
rA.,. ) ...... 'Ts 0
VV k •e" -I. 2.-7mem = H(KI7r)kBT2 ln 2 . (4.10)
This leads to an engine efficiency
+) ----
0 T i -0
" k -" k T2
71 = +) 5 1 — 7-
,i- , (4.11)
k
which is Kelvin's efficiency limit for a heat engine operating between two heat
reservoirs.
Zurek27,28 advocates taking the "inside view"—i.e., analyzing the cycle from
the point of view of a demon-memory, which is smart enough to shorten, if it can, its
record of the sub-macrostate k. It is then crucial to understand precisely what the
demon knows at each stage of the cycle—what does the demon know, and when does
it know it? At the start of the cycle, the demon knows enough to specify the initial
system macrostate, but it must also know how to describe all the sub-macrostates,
else it could not be ready with a device that can be configured to "fit" all possible
sub-macrostates. Thus, the demon knows enough to generate a list of the accessible
microstates, assign them equal probabilities, and group them into the subsets Jk .
In the notation used above, the demon stores enough information to compute the
n-tuple pZci.., where the superscript (i) designates the initial macrostate. After
the cycle, the demon, to be ready for the next cycle, must know enough to compute
the corresponding n-tuple 0114.. for the final macrostate. If the system is a heat
reservoir, however, the n-tuples pVicti and p(jfA, are the same, the distinguishing
superscripts may be dropped, and one may refer to the demon's "standard state"
at the beginning (or end) of a cycle. The minimum amount of information that the
demon stores in its standard state is the algorithmic prior information /(pixi..).
To see why the two n-tuples are the same, refer to the previous discussion of
the microcanonical ensemble: the n-tuples are identical if the energy change 147P.)
is much smaller than the resolution LE = 2-n+1E(i) used to define the micro-
canonical ensemble—i.e., if 1 >> 2"W +)/E(') ", 2' (CTig(i))16.71/T1 I, where C
is the system's heat capacity. This condition, essentially the same as the condition
ItiTiai I << 1 for the system to be a heat reservoir, is satisfied by a sufficiently
macroscopic system. Treating the initial and final macrostates as the same ignores
the changes in system energy and entropy. These changes are essential for analyz-
ing the operation of the engine, but they are completely obscured in defining the
system macrostates by any reasonable energy resolution LE.
When the demon observes and records sub-macrostate k, it must store enough
information, beyond the prior information, to describe that sub-macrostate—i.e.,
to compute the n-tuple pjik,, that lists the states in J k and assigns them equal
conditional probabilities p(jlk, 7). To get back to its standard state, ready for the
next cycle, the demon must erase information from its memory. Zurek26 shows that
the minimum number of bits that must be erased is (in the notation used here) the
conditional algorithmic information /(pjik,,I pjxpr ). Zurek's argument is subtle
and relies on the properties of reversible computers, so it is fortunate that his
conclusion, as he emphasizes, makes eminent good sense: the minimum number of
bits that must be erased is the minimum amount of information the demon needs,
beyond the prior information, to describe sub-macrostate k.
Invoking Landauer's principle, one can now say that the demon must supply
work
147-) > /(pjf kor I p j,K1,0kB T2 In 2 , (4.12)
to erase the useless bits and return to its standard state, if the bits are erased
into an environment at temperature 2'2. Should the demon find the system in an
algorithmically simple sub-macrostate--i.e., pixix) << H(Ki r)—it can
beat Kelvin's efficiency limit, but on the average—the only sense in which the
Second Law is meant to apply—the demon must supply work
> k B T2 In 2 E p(k I 7)/(P.Iikor Pijoir) H(K I n')kB T2 In 2 , (4.13)

k
which is identical to Eq. (4.10) and leads again to Kelvin's efficiency (4.11). The
crucial last inequality in Eq. (4.13) follows from the strict left inequality in Eq. (3.33)
and justifies the attention devoted to inequalities for a tree search.
The demon wins occasionally, but not in the long run.
ACKNOWLEDGMENTS
This work was supported in part by the Faculty Research and Innovation Fund at
the University of Southern California.
REFERENCES
1. Bennett, C. H. "The Thermodynamics of Computation-A Review." Intl. J.
Theor. Phys. 12 (1982):905-940.
2. Bennett, C. H. "Demons, Engines, and the Second Law." Sci. Am. 257(5)
(November 1987):108-116.
3. Bennett, C. H. "Notes on the History of Reversible Computation." IBM J.
Res. Develop. 32 (1988):16-23.
4. Chaitin, G. J. "On the Length of Programs for Computing Finite Binary Se-
quences." J. Assoc. Comp. Mach. 13 (1966):547-569.
5. Chaitin, G. J. "A Theory of Program Size Formally Identical to Information
Theory." J. Assoc. Comp. Mack. 22 (1975):329-340.
6. Chaitin, G. J. "Algorithmic Information Theory." IBM J. Res. Develop. 21
(1977):350-359.
7. Chaitin, G. J. Information, Randomness, and Incompleteness: Papers on Al-
gorithmic Information Theory. Singapore: World Scientific, 1987. A collection
of Chaitin's papers.
8. Gallager, R. G. Information Theory and Reliable Communication. New York:
Wiley, 1968. An introduction to statistical information and its application to
communication theory.
9. Gibbs, J. W. "Elementary Principles in Statistical Mechanics." In The Col-
lected Works of J. Willard Gibbs, Vol. II, Part One. New Haven: Yale Univer-
sity, 1948.
10. Huang, K. Statistical Mechanics. New York: Wiley, 1963. A standard text-
book, more advanced than Ref. 21.
11. Huffman, D. A. "A Method for the Construction of Minimum Redundancy
Codes." Proc. IRE 40 (1952):1098-1101.
12. Jaynes, E. T. Papers on Probability, Statistics and Statistical Physics, edited
by R. D. Rosenkrantz. Dordrecht, Holland: Reidel, 1982. A collection of
Jaynes's papers, unrivaled for clarity and persuasiveness.
13. Jaynes, E. T. "Clearing up Mysteries: The Original Goal." In Maximum En-
tropy and Bayesian Methods, edited by J. Skilling. Dordrecht, Holland: Rei-
del, 1989,1-27.
14. Jaynes, E. T. "Probability in Quantum Theory." This volume.
15. Kolmogorov, A. N. "Three Approaches to the Quantitative Definition of In-
formation." Problemy Peredachi Informatsii 1(1) (1965):3-11. English trans-
lation in Prob. Inform. Transmission 1 (1965):1-7.
16. Kolmogorov, A. N. "Logical Basis for Information Theory and Probability
Theory." IEEE Trans. Inform. Theory IT-14 (1968):662-664.
17. Landauer, R. "Irreversibility and Heat Generation in the Computing Pro-
cess." IBM J. Res. Develop. 5 (1961):183-191.
18. Landauer, R. "Dissipation and Noise Immunity in Computation and Commu-
nication." Nature 335 (1988):779-784.
19. Levin, L. A. "Laws of Information Conservation (Nongrowth) and Aspects

of the Foundation of Probability Theory." Problemy Peredachi Informatsii
10(3) (1974):30-35. English translation in Prob. Inform. Transmission 10
(1974):206-210.
20. Levin, L. A. "Various Measures of Complexity for Finite Objects (Axiomatic
Description)." DokL Akad. Nauk SSSR 227 (1976):804-807. English transla-
tion in Soviet Math. Dokl. 17 (1976):522-526.
21. Reif, F. Fundamentals of Statistical and Thermal Physics. New York:
McGraw-Hill, 1965. A standard textbook.
tion. Urbana, IL: University of Illinois, 1949.
23. Solomonoff, R. J. "A Formal Theory of Inductive Inference. Parts I and II."
Information and Control 7 (1964):1-22 and 224-254.
24. Szilard, L. "On the Decrease of Entropy in a Thermodynamic System by the
Intervention of Intelligent Beings." Zeit. Phys. 53 (1929):840-856. English
translation in Quantum Theory and Measurement, edited by J. A. Wheeler
and W. H. Zurek. Princeton, NJ: Princeton University, 1983, 539-548.
25. Watanabe, S. Knowing and Guessing: A Quantitative Study of Inference and
Information. New York: Wiley, 1969. Chapter 1 gives an account of statistical
information as applied to a tree search.
26. Zurek, W. H. "Thermodynamic Cost of Computation, Algorithmic Complex-
ity, and the Information Metric." Nature 341 (1989):119-124.
27. Zurek, W. H. "Algorithmic Randomness and Physical Entropy." Phys. Rev. A
40 (1989):4731-4751.
28. Zurek, W. H. "Algorithmic Information Content, Church-Turing Thesis,
Physical Entropy, and Maxwell's Demon." This volume.
29. Zvonkin, A. K., and L. A. Levin. "The Complexity of Finite Objects and the
Development of the Concepts of Information and Randomness by Means of
the Theory of Algorithms." Uspekhi Mat. Nauk 25(6) (1970):85-128. English
translation in Russ. Math. Surveys 25(6) (1970):83-124.
J. Rissanen
IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120-6099
Complexity of Models
1. INTRODUCTION
The fundamental task of trying to learn the properties of the mechanism generating
a set of observed data is called the problem of model construction or modeling.
This in general involves sorting out the relevant variables, which may or may not
be directly observed as a part of the data, and trying to discover possible causal
relationships amongst them. Modeling problems are at the heart of all scientific
activity, and no wonder they pose formidable difficulties already in the simplest
cases such as the curve-fitting problems, which have been a subject of systematic
studies at least since the times of Gauss.
There have been persistent difficulties in properly formalizing the intuitive ideas
that we have about modeling, at the root of which—we think—is the question of
how to deal with the complexity of the models themselves. An obvious attempt has
been to avoid the issue by declaring that the data have been generated by some
relatively easy to describe "true" machinery, a "law," and the inevitable deviations
are ascribed to "noise" stemming from various sources. While such a strategy works
to a degree in simple situations, we are left in the cold when our preconceived ideas
about the "true" machinery fail to produce a "law" with which the deviations could
with good conscience be regarded as just instrument noise. To be sure, there are

118 J. Rissanen
intuitively attractive principles to give guidance in model selection such as Oc-

cam's razor, "entities should not be multiplied beyond necessity," or "every theory
should be as simple as possible—but not simpler," as Einstein may have put it with
tongue in cheek undoubtedly. However, without a clearly defined way to measure
the amount of "entities," "simplicity," and "necessity" such principles amount to
little more than common sense; for a fascinating discussion of the issues involved
we refer to Kemeny.6
Inspired by the algorithmic theory of information, Solomonoff,17,18
Kolmogorov,7 and Chaitin,1 we have over the years advanced a related principle for
model selection ,9,10,11,14 and in a book form.16 According to this, the best model is
one which permits the encoding of the observed data together with the model with
the fewest number of binary digits. The main difference between this principle and
the algorithmic complexity is that for various reasons we do not want the class of
models to consist of the set of the computable functions. Rather, we select models
that are being used traditionally in physics, engineering, and statistics, and which
can be unified under probabilistic models, as discussed below. We try to take ad-
vantage of the available prior knowledge in suggesting model classes, and we weed
out the bad suggestions by the universal code length yardstick. Selecting a class of
probabilistic models does not mean that we think there to be a "true" probabilistic
machinery generating the data. In fact, we believe that the world is neither deter-
ministic nor probabilistic. These descriptions are simply all we have, and we better
look for properties of the kind that we can describe.
It seems that the principle of the shortest code length is about as fundamental
and objective as any criterion can be. It allows us to compare any two models
regardless of whether they are of the same kind or not. It allows us to measure
in an exact way how much we have learned from the data: a great reduction in
the code length means that we must have spotted a relevant property of the data
generating machinery; and vice versa, there is nothing to learn from a "random"
string, which has no regular features and cannot be compressed. There are some
subtle limitations in this approach, which are related to the question of how to
make precise the code length needed to describe a model class. Since a language is
needed to describe anything at all, we must somehow specify the primary language,
which on the one hand ought to be powerful and flexible, but on the other hand
it also ought to be precisely definable. As it happens, these two requirements can
never be completely reconciled and we deliberately leave the ground language vague,
say, the English language augmented with the usual mathematical notations. The
choice of the set of programs for a universal computer as the ground language, as in
the algorithmic theory, has the advantage of being well formalized, but it depends
on the arbitrarily chosen grammar. Since the observed data set is often too small
to permit us to ignore the length of the translation program needed to simulate
another universal computer, the algorithmic complexity is no guarantee that we
have learned from the data what we need to learn. Of course, to compound to the
problem, the non-computability of the algorithmic complexity makes the search for
it futile anyway.
Complexity of Models 119
The main task then is one of encoding, or rather estimating the code length
with which the encoding could be done while using the models in each class. What
is particularly fortunate and perhaps surprising is that we can provide excellent
estimates of the code length for virtually all the interesting classes of models, and
the result changes the way that we can do statistics.
2. MODELS
Intuitively, the idea of a model is a mathematical description of the relationship
between the selected variables with a number of free parameters to be fitted to
the data. This implicitly recognizes the fact that no matter what model we pick,
the "law" it expresses for the observed data does not quite hold—whether because
of "measurement errors" or other sources that we failed to include. There is no
fundamental difference between "random" errors and the "systematic" errors due
to our failure to include all the relevant variables. Both are manifestations of our
inability to fully explain the data. With this in mind, we propose the sweeping claim
that "all models are fundamentally probabilistic," or at any rate for our purposes
they can be expressed as such. More specifically, consider the observed data of the
kind (y, x) = (V', xn) = (Yi, x1), • • • , (Yn, xn), where we think the xt- variables to
influence the yt- variables, both in general having several components. The first
part in the model or, more properly, model class is a "deterministic law" pt =
Fw,v-1 1.6,,,) which allows us to predict the observed numbers yt as a parametric
function of the observations available at t. The parameter 61 is a collection of k
real-valued components 81 , . • - ,Ok , and their number is also to be determined. The
second part in the model is a probability distribution P(yt 10t) with which we model
the deviations, taken as independent since we imagine the deterministic part to
account for the dependencies between yt and yr for t 0 r. We can therefore write
the probability that such a model assigns to the observed data as
n
P(ylx, 9) = 11
t=1
P(yt It).
As a simple example, suppose we wish to fit polynomials to a scatter of points

(yt , xt) on a plain. We then select the deterministic law
yt = el + e2Xt + • • • + 44-1- (2.1)
As to the probabilistic part, it gets induced by the measure 45(Yt4t), such as

' = ( yt - gt)2, for the prediction errors. Indeed, we may set
6 (y t , 00
P(Ytigt) = Ke-4(Yt ,tt), (2.2)

120 J. Rissanen
where K is a constant. In this way, prediction problems, where the objective is

to find a model to minimize prediction errors, can also be expressed in terms of
probabilistic models. Indeed, minimizing the sum of the prediction errors is equiv-
alent to minimizing the quantity — log P(ylx, 9), which as discussed below has the
interpretation of a code length. Moreover, unlike prediction errors, which can be
scaled as close to zero as we like and hence cannot be used as a universal yardstick
for model selection, it will be impossible to encode data with zero code length no
matter how we pick the models.
3. STOCHASTIC COMPLEXITY
We begin by recalling Shannon's coding theorem. Let C be a 1-1 coding function
from a discrete set X into the set of all finite binary strings B*. Let the length, i.e.,
the number of binary digits in C(x), be L(x). A code is said to be a prefix code, if
the lengths satisfy the fundamental Kraft inequality:
E 2-L(z) < 1. (3.1)

xEX
The significance of this to us is that a prefix code defines a distribution on X,

and even the converse is true in the sense that for any sequence of integers L(x),
E X, such that (3.1) holds, there exists a prefix code with lengths agreeing with
the given sequence. Hence, for all intents and purposes a prefix code is equivalent
to a distribution. As an important bonus we get a uniform interpretation of proba-
bilities, and we see that there is no fundamental difference between "random" data
and "non-random" parameters; both are objects that can be encoded. Shannon's
fundamental coding theorem states that for a distribution P(x), all prefix codes
must have the mean length bounded from below by the entropy,
E P(x)L(x) > — E P(x) log P(x), (3.2)
and that the lower bound can be reached only if the lengths satisfy the equality
L(x) = — log P(x) for every x. In this sense then, we know how the code should
be designed for a given distribution; because of this, we regard — log P(x) as the
Shannon complexity of x relative to the "model" P.
Our task at hand is to generalize the just-outlined coding program to data,
which are not modeled by a single distribution but by a whole class M = {P(ylx, 9)},
where 0 ranges over some subset fik of the k—dimensional Euclidean space, and y
and x denote sequences of truncated numbers to have a countable range.
First, for each fixed parameter value, we know from Shannon's work that it
takes about
L(ylx, 0) = — log P(ylx, 0) (3.3)
bits to encode the data. However, the decoding can only be done if the decoder
knows the parameter value that the encoder used. Whenever the parameters range
over the reals, it is clear that to describe them by a finite binary string they must be
truncated to a finite precision. For simplicity, take the precision the same S =
for all of them. Then we can write each parameter with q bits plus the number
of bits needed to write the integer part, which in fact turns out to be ignorable.
If B = 9(yjx) denotes the maximum likelihood estimate which minimizes the code
length (3.3), then since the truncation of B to the precision S may deviate from
the optimum as much as S for each component, the code length (3.3) after the
truncation is larger than the minimized code length. The larger S that we pick the
larger this increase will be in the worst case, while at the same time it will require
fewer bits to describe the truncated parameters. There is then the optimal worst
case precision, which can be found by expanding Eq. (3.3) into Taylor's series about
O. The result is that the optimal precision depends on the size of the observed data
set as follows: — log b = z log n, and we get the total code length as
MDL(yix,k)= — log P(ylx,e) t- log n. (3.4)
This so-called MDL (Minimum Description Length) criterion,9 may be minimized

over the number of the parameters k to get the optimal model complexity. Formally
the same criterion but with a quite different Bayesian interpretation was also found
by Schwarz.16 Many other criteria of the same general two-part type have been
proposed for statistical estimation. The first part is either a sum of the prediction
errors or the negative logarithm of the likelihood as in Eq. (3.4), while the second
part is designed to incorporate the necessary penalty for over-parameterization
and sometimes involving a multiplicative constant. However, a subjective choice of
the second term makes it impossible to compare rationally the various criteria or
claim a meaningful optimality to any model found by them. By contrast, in our
principle both parts in Eq. (3.4) admit a natural interpretation: The first part may
be defined to be the complexity of the data, given a model, while the second gives
the optimal complexity of the model itself. That both get measured by the same
unit is a tremendous advantage, and we need no arbitrary scale conversion.
However, something disconcerting remains about the criterion (3.4). It repre-
sents the code length obtained with a particular coding procedure: First, all the
data are examined in order to find the best parameter value, and then the data are
encoded using these parameter values. Since there are other conceivable ways of
doing the coding, the question remains whether the just-found length really is the
shortest. In fact, a careful examination of the described process reveals that there
is something redundant in it, for we end up describing both the data and some pa-
rameter values, while a code length for the data, given a class of models, need not
actually depend on any particular model, i.e., parameter value. The redundancy
can be removed if we select a distribution r(9) for the parameters, too. Such a
prior distribution is traditionally thought to reflect our prior knowledge about the
parameter values, but for us it is just another piece in the model class to be picked
122 J. Rissanen
with or without prior knowledge. Having such a prior we can eliminate the inherent
redundancy in the code length, and the result is as follows,15
/(ylx, M) = — log P(ylx, M), (3.5)
where
P(Ylr, M) = jP(Y1r,e)7(0 )69- (3.6)
We call Eq. (3.5) the stochastic complexity of the data, given the model class M.
As a justification for calling Eq. (3.5) the stochastic complexity, we now de-
scribe in an informal manner a theorem which may be viewed as an extension of
Shannon's coding theorem. For a precise statement we refer to Rissanen.12 The the-
orem hinges on the general assumption about the models, taken here to be defined
by the densities ge le, 9), that there must be some estimator ow Ix") which con-
verges in probability to the parameter 9 defining the data generating distribution.
The convergence rate for very general classes of models is 1/,/ per parameter. It
follows then that no matter which distribution, say, density function g(yn le), one
picks the following inequality holds
2
—Ee log g(y"Ix")> —Eelogf(y"Ix", 9) + k e log n (3.7)
for all positive numbers e, and all 9 except some in a set whose volume goes to zero
as n grows.
If we take the density g as one resulting from our best efforts to estimate the
data-generating distribution, we see that not only is the left-hand side bounded
from below by the entropy but it must exceed it by a definite amount, which
simply represents the uncertainty inherent in any estimation process. If we divide
both sides by n, we see that this uncertainty reduces to zero (but not the first term)
at the given maximum rate as we get more data and learn more about the data-
generating machinery. We also see at once that Eq. (3.4) as a code length cannot be
improved upon asymptotically. Further, one can show under general conditions,14
that Eq. (3.5) is smaller than Eq. (3.4) for large n, and hence, in particular, it is
also asymptotically optimal.
The general objective in model building is to search for a model class which
minimizes Eq. (3.5). After the class is found, including the number of its free pa-
rameters, we can find the corresponding optimal parameter values and hence the
optimal model, if desired. Frequently, the complexity (3.5) is expressed in terms of
a density function, and in such a case it does not immediately represent a real code
length. It sometimes happens that the density function, written now as f(ylx, M),
is very peaked, which implies that the simple process of calculating the probability
of the truncated data by f (Mx, M)bn may be too crude and may lead to incorrect
code length for the data.
We illustrate the computation of the stochastic complexity with the polynomial
curve-fitting probleni on data from Picard and Cook.8
EXAMPLE 1
From 20 flocks of Canada geese the numbers xi , i = 1, • • • , 20, of adult birds were
estimated as 10, 10, 12, 20, 40, 40, 30, 30, 20, 20, 18, 35, 35, 35, 30, 50, 30, 30, 45,
and 30, respectively. The same flocks were also photographed, from which the true
numbers of adult birds yi , i = 1, • • • , 20, were counted. Written in the same order
as the corresponding estimates they are as follows: 9, 11, 14, 26, 57, 56, 38, 38, 22,
22, 18, 43, 42, 42, 34, 62, 30, 30, 48, and 25. We like to fit a polynomial predictor
as in Eq. (2.1). With a quadratic deviation measure (yt - gt)2/r, where r is a
parameter, the distribution (2.2) is gaussian with mean gt and variance r/2. We
select the prior ir(0) also as gaussian with mean zero and covariance (r/c)I, where
I is the k x k identity matrix, and c a further "nuisance" parameter to be picked
in a moment. For r we pick the so-called conjugate prior,2
(T) - (a/2)3/2 e-4/(2r) -3/2

1/71-
where a is another "nuisance" parameter. The reason for choosing these priors is
simply because we can get the integral (3.6) written in a closed form. The two
nuisance parameters can then be selected so that the complexity is minimized,
which gives the final criterion
/(ylx, k) log it + log IX'XI K(n), (3.8)
where K(n) depends only on n and .can be dropped. Further, the elements of the
matrix X'(t) are given as xii x2i-17 i = 1, • • • , k, j = 1, • • • , n, and R is the
minimized sum of the squared deviations Et(yt -
The stochastic complexities as a function of k come out as /(ylx, 1) = 123.5,
/(ylx, 2) = 93.7, /(ylx, 3) = 102.6, and /(ylx, 4) = 115.7. Hence, the minimizing
polynomial is linear. Its coefficients are given by 01 = -3.8, 92 = 1.3, and the
resulting line fits the data well. In fact, when the plot is made, most human ob-
servers, using personal judgment, would pick this line rather than a higher-degree
polynomial as the best fit.
4. CONNECTION WITH ME PRINCIPLE

The MDL principle, as we call the principle of searching for a model and model
class which minimize the code length for the data, or the stochastic complexity in
case we can calculate the required integral, is very general indeed. First, it evidently
generalizes the maximum likelihood principle to something that might be regarded
as a global maximum likelihood principle. A predictive form of the MDL principle
provides a particularly neat generalization of the least-squares principle in that also
124 J. Rissanen
the number of parameters can be estimated. It extends the optimality properties

of the least-squares estimates to an at least asymptotically optimal estimates of
the number of the parameters.13 (To conclude this brief survey we explain how
the important maximum entropy (ME) principle, too, may be viewed as a special
instance of the MDL principle, which further provides an extension of the ME
principle.)
In the ME principle we ask for a density function f(x), defined on the set of
data of interest, which has the maximum entropy subject to a given number of
mean constraints of the type
f(x)Ai(x)dx = di,i = 1, • • • ,k, (4.1)

J
where the Ai denote certain known functions and d1 are known numbers. It is well
known,3,4,5 and in fact an immediate consequence of Shannon's coding theorem,15
that the solution is given by
p(xIA) = z-1(A)e--).'A(z), (4.2)
where A'A(x) = 1 Aali(x), and A = Ad is the solution to the equation
—grad log Z(A) = d. (4.3)
In virtually all applications the mean constraints are so selected that they equal
the actually measured value d = A(t)= (Al (x), • • • ,Ak(x)), where x now denotes
the particular observed data. Then it is true that
max H(A) = log Z(Ad ) + A'd d = min — log p(x IA). (4.4)
We thus see that the requirement of the maximum entropy coincides with the
requirement of the shortest code length, when the constraints are taken as known.
However, these are not known or unique in model building, and in order to be able
to compare fairly several conceivable suggestions, we should add to the code length
(4.4) the code length needed to describe these suggestions, which is nothing but
the general MDL principle. In an approximation then, we should minimize
min— log p(x IA) + —

2 log n, (4.5)
k,A
which we may regard as the generalized ME principle.

REFERENCES
1. Chaitin, G. J. Algorithmic Information Theory. Cambridge: Cambridge Uni-
versity Press, 1987.
2. Cox, D. R., and D. V. Hinkley. Theoretical Statistics London: Imperial Col-
lege, 1974.
3. Jaynes, E. "Information Theory and Statistical Mechanics." Phys. Rev. 106
(1957):620.
4. Jaynes, E. "Information Theory and Statistical Mechanics." Phys. Rev. 108
(1957):171.
5. Jaynes, E. "On the Rationale of Maximum Entropy Methods." Proc. of
IEEE, Special Issue on Spectral Estimation, edited by S. Haykin, 70
(1982):939-952.
6. Kemeny, J. "The Use of Simplicity in Induction." Phil. Rev. 62 (1953):391-
315.
7. Kolmogorov, A. N. "Three Approaches to the Quantitative Definition of In-
formation." Problems of Information Transmission 1 (1965):4-7.
8. Picard, R. R,., and R. D. Cook. "Cross-Validation of Regression Models."
JASA 79 (1984):387, 575-583.
9. Rissanen, J. "Modeling by Shortest Data Description." Autornatica 14
(1978):465-471.
10. Rissanen, J. "A Universal Prior for Integers and Estimation by Minimum De-
scription Length." Ann. of Stat. 11 (1983):416-431.
11. Rissanen, J. "Universal Coding, Information, Prediction, and Estimation."
IEEE Trans. Inf. Theory IT-30 (1984):629-636.
12. Rissanen, J. "Stochastic Complexity and Modeling." Annals of Statistics 14
(1986):1080-1100.
13. Rissanen, J. "A Predictive Least Squares Principle." IMA Journal of Mathe-
matical Control and Information, 3(2-3) (1986):211-222.
14. Rissanen, J. "Stochastic Complexity." The Journal of the Royal Statistical
Society 49 (1987):223-239 and 252-265 (with discussions).
15. Rissanen, J. Stochastic Complexity in Statistical Inquiry. New Jersey: World
Scientific Publ. Co., 1989.
16. Schwarz, G. "Estimating the Dimension of a Model." Ann. of Stat. 6
(1978):461-464.
17. Solomonoff, R. J. "A Formal Theory of Inductive Inference." Part I, Informa-
tion and Control 7 (1964):1-22.
18. Solomonoff, R. J. "A Formal Theory of Inductive Inference." Part II, Infor-
mation and Control 7 (1964):224-254.
C. H. Woo
Center for Theoretical Physics, Department of Physics and Astronomy, University of Mary-
land, College Park, MD 20742
Laws and Boundary Conditions
MINIMAL PROGRAMS AND BOUNDARY CONDITIONS

Although the notion of minimal programs from the algorithmic information
theory1'7'11 is inappropriate for quantifying the economy of the laws of nature,
it is an appropriate concept in discussing the economy of natural models, where
by a model we mean a complete theory including both the laws and the bound-
ary conditions needed for a self-contained description of the known data and for
projecting extensions of the data. Because the laws of physics are algorithmically
simple, the length of the "minimal program" representing the laws on one universal
computer may not be larger than the change in that length when another universal
computer is used instead. Obviously, "minimality" has little meaning in this con-
text. Furthermore, the "minimal program" is minimal among all the programs of
a fixed machine that would produce a given output, and hence it is important to
be clear about the nature of the output. In the case where one identifies the laws
with the input program, the output in question can only be certain regularities ex-
tracted from empirical data and not the full data about our specific world or about
a specific sequence of events. For example, if one drops a large number N of heavy
balls from different heights and records the time for each to reach the ground, the
raw data, which the output of a universal computer is suppose to match, contains

128 C. H. Woo
2N numbers. Even after one uses Newton's law to obtain an economical model
to explain this specific data set, the input still contains N + 0(1) numbers. The
number of bits assigned to the laws of motion is in the 0(1) term, which would
be insignificant compared to N when N is large. In this example of laboratory
experiment, the N numbers corresponding to the boundary condition (the initial
heights) are arbitrary and uninteresting, and it makes sense to ignore the boundary
condition and concentrate on the laws. But, in the case where a description of our
specific world is contemplated, as in quantum cosmology or in accounting for any
natural feature which is not an inevitable result of the laws alone, the boundary
condition represents indispensable input information. Then a minimal input pro-
gram, which contains the information about such specific features, can be very long
indeed. When the program itself is much longer than the instruction for simulating
one computer on another, the notion of minimal programs becomes meaningful.
Since we will be concerned with the economy of models which include the
boundary conditions, we want to state clearly what we mean by "boundary condi-
tions": we include in boundary conditions any input information over and above the
laws of motion which is needed to describe the specific behavior of a system. Thus,
a "boundary condition" includes the initial condition, but can contain much more.
In particular, in a quantal description of a sequence of events in our specific world,
the input includes the initial density matrix (initial condition), the unitary evolu-
tion (the laws), and the information needed to specify one "consistent history"4,6,10
from among all the possible consistent histories; thus, "boundary condition" in this
case includes the first and the third type of information.
Suppose one considers the conditional probability for the occurrence of a certain
event in terms of the initial density matrix po and a sequence of time-ordered
projections E1(n5 ) in the Heisenberg picture (where j refers to the nature of the
observable and n5 to the eigenvalue or a group of eigenvalues):
P (Ei(ni) =
Tr (Ei(ni) Ei(ni)PoEi(ni) • • - Ei(ni)) (1)
Tr (Ei-i (ni-i) Ei(ni)poEi(ni) • • -Ei-i(ni-i))
As one traces the history (that is, repeats Eq. (1) for decreasing values of i), two
types of projections should be distinguished: those which yield conditional proba-
bilities near one and those which do not. Projections which affirm classical links,
to the extent that they are favored by the unitary evolution, are in the first cate-
gory, whereas the association of our world with one specific history following any
branching of histories is in the second category. It is the second type that signifi-
cantly changes the input information, since in this case the Ei(ni), which becomes
part of our history, is not determined by being strongly favored by the dynamics
and the previous history, as is the case for the first type of projections. If both the
initial condition and the laws are simple, then almost all the information content of
our specific world arises from the second type of projections, that is, from amplified
quantum fluctuations. There is a prevailing attitude that the essence of quantum
Laws and Boundary Conditions 129
measurements resides in the first stage when a pure state becomes locally a mix-
ture, and not in the second stage when one alternative in the mixture becomes
identified with the actual outcome. The reasoning is that this second stage is "just
like what happens in classical probability." But in classical probability one accepts
the arbitrariness of the outcome of a coin toss because, one thought, the deter-
mining factors are numerous and microscopic. When the link between the theory
and the observations in our specific world still involves arbitrary elements even in
a putatively self-contained quantum theory, this fact deserves to be underscored
and the stage at which arbitrariness enters clearly identified. In any case, in terms
of contributions to the minimal program describing our specific world, it is in the
second stages that a substantial amount of algorithmic information enters.
ECONOMY AND INFERENCE

Algorithmic information theory provides us not only with a way of quantifying the
information content in any string w, but also with various probabilities. As long as
one makes use of algorithmic information, it is only natural to consider the implica-
tions of the corresponding probabilities also. For example, Zurek12 conjectured that
the physical entropy S of a system is the sum of its Gibbs entropy and the available
information about the state. Although he called this a hybrid of probabilistic and
algorithmic notions, it would be nice to also have a uniformly probabilistic inter-
pretation, i.e., the interpretation of exp(—S) as the product of two probabilities.
We shall confine our attention here, however, first to what the a priori probabilities
of algorithmic information theory suggest about the prevalence of short codes and
about the utility of minimal programs for inference, and then to the question of
what such a priori probabilities have to do with physical systems.
An important ingredient of the algorithmic information theory is that the pro-
grams are to be self-delimiting,1,8 in which case no prefix or extension of a valid
program is another valid program. As a consequence there is a natural "equal prob-
ability assumption," corresponding to a random filling of a one-way input tape.
Because a valid program contains information about its own termination, a com-
puter can be instructed to accept from the randomly filled tape any beginning
portion which happens to be a valid program, and then ignore the rest. Suppose
p(w) is a binary program for the given universal computer to produce the output
w, and 1 p(w) I is its length. Obviously the probability that p(w) occurs as the first
portion of a randomly filled input tape equals 2-1P(101, and hence the probability
P(w) (which adds up to one when the non-halting probability is also included) for
having the string w as the output is:
p(w) = E (2)
p(w)
130 c. H. Woo
Let p* (w) be a minimal-length program, and let I p* (w) I = 1(w). .1(w) is called the
algorithmic information (or algorithmic complexity) of w. It is also approximately
equal to the algorithmic entropy H (w) = — log2 P(w):
1(w) = — log2 P(w) + 0(1) . (3)
Although Eq. (3) says that an individual string with a short minimal program is
overwhelmingly more likely to be produced compared to one with a significantly
longer minimal program, one must reckon with the fact that there are many more
long programs than short ones. When one compares categories of output strings,
the highly compressible category versus the incompressible category, it is no longer
true that the former is overwhelmingly favored. Let us define the expectation value
E(a) of an attribute a(w) of strings w by E(a) a E. a(w)P(w), and denote the
length of w by n(w), then from the Appendix we see that when we consider the
limit of long strings as n(w) oo:
(1) E (log rt) . 00

(4)
(ii) E — oo for any e > 0 . (5)
(i) shows that there is a non-negligible probability for the occurrence of highly
compressible strings, whereas (ii) shows that there is also a non-negligible proba-
bility for the occurrence of nearly incompressible strings. In short, the probability
for categories is broadly distributed. A broad distribution is not very informative;
still, it should be noted that this conclusion is radically different from the naive
expectation that, because there are more long programs, the chance of getting a
short program at random is exponentially small. What the algorithmic information
theory brings out is that there are also factors working in favor of the occurrence of
outputs with short programs. (We will discuss a physical analog of this mechanism
later.)
Once one finds a minimal or near-minimal program for a given string w, it has
not only descriptive value but also projective value, in the sense that it will be
useful in coding the probable extensions of w. By an extension we mean here the
concatenation (w, x) of w with another string x. From Chaitin,1 theorem 15b:
/ = —log { P(w' ) + c, (6)

P(w)
where the ratio in the braces is the conditional probability that w is extended into
(w, x), c is 0(1), and I(x/w) is the length of the minimal program for producing
x if the computer already has available to it p*(w). If the same computer is used
for producing w and (w, x), and if there are not many near-minimal programs for
them, even r does not differ from unity by many orders of magnitude. Then Eq. (6)
implies that the minimal program of the original string w facilitates the economical
coding of its probable extensions.
Having looked at the a priori probabilities, we now return to the idea of regard-
ing certain facts about our specific world as a data string, and its minimal program
as a theoretical model. Obviously we will never be able to deal with all the facts—
the cosmologists, for instance, ignore many of the fine details of our universe, but
it is always understood that more features can be added later. By addition we do
not mean just an extension in time, but also improvements in accuracy, inclusion of
previously ignored variables, etc. Then, if we envision that at some point the data
that one tries to understand are rich enough in information so that one expects,
say, I(w) > 106 bits, it makes sense to search for maximum economy even in model-
ing the boundary conditions. Although a near-minimal program does not have the
universal applicability of the laws, it is like the laws in that its utility goes beyond
a mere economical summary of the data in question and may extend to data not
yet studied.
NOISES AS INPUT PROGRAMS

When we spoke of "likely" or "probable" earlier, we were using the a priori proba-
bilities of algorithmic information theory, but why should such a priori probabilities
have anything to do with physical systems? Some authors have considered thought
experiments in which physical systems are actually equipped with computers (gen-
eralized Szilard engine, etc.), but even in that case there appears to be no reason
to consider random programming. For some not particularly complex systems,3,6,9
the physical evolution itself can carry out universal computation, but when the
initial configurations are regarded as the input programs, they are not naturally
self-delimiting. Let us consider, however, the situation in which a system is influ-
enced by finitely effective noises: only a finite number of noises of a fixed type,
for only a fraction of the evolution period, are instrumental in yielding a given
outcome. The physical situation that one has in mind is when the evolution of a
system is affected by certain noises initially, but becomes independent of the noises
once the course toward a final stable or quasi-stable configuration is set. If such
a system, starting from a fixed, favorable initial configuration, has computational
universality with the relevant noises, up to the point of "noise decoupling," as the
input programs, the stable configurations will obey the probability distribution of
algorithmic information theory, as the inputs are both random and self-delimiting.
This provides a physical parallel to the preference for short codes in algorithmic
information theory: out of meaningless noises, the accidentally meaningful ones
(those which give rise to stable configurations) get selected by the robustness of
the outcome, and the sooner this robustness is reached, the more that particular
output is favored individually in the probability distribution.
132 C. H. Woo
There are different ways whereby noises can become finitely effective; it can be
shown that some one-dimensional cellular arrays possess computational universality
with the effective parts of noises as inputs, but the models that we have found
so far are somewhat artificial. No doubt the list of physical systems capable of
universal computation will continue to grow, and the relevance of these a priori
probabilities will be assured if there are many such systems with noises as inputs.
If, for example, some spin systems have computational universality with finitely
effective noises as inputs, then one can predict how the complexities are distributed
for the configurations resulting from a fixed initial state.
In this kind of application, it is essential that many similar copies of a specific
type of system exist, so that the probability distributions have empirical relevance.
The situation is different when non-duplicated features are the subject of study.
Although the application of the generic features of a probabilistic theory to the
single universe to which we have access has become a not-so-rare practice, to at-
tribute objective reality to the alternative universes of a probabilistic theory would
be a deviation from the principle of verifiability (provided the universes are strictly
non-communicating, either through tunneling or otherwise). The only justification
that I know of for applying the generic features of a probabilistic theory to a single
specimen with an already known behavior is as an economical model for the extant
data, with the utility of the model to be checked by the persistence of that economy
in explaining extensions of the data.
SIMPLICITY AT THE CORE?

Consideration of minimal programs as the information content of specific physical
systems naturally leads to the question: "What is the total amount of information
when the system is our universe?" In fact, a more ambitious question has been asked:
"Why is our world compressible?"2 The degree of compressibility, as measured by
I relative to n, is somewhat ambiguous here in the absence of an explicitly given
data string—given the same facts, two different ways of organizing them into data
strings, say, wi and w2, can result in n(wi ) # n(w2), although if no selective
discard of information has occurred in the process, one still has /(wi) = /(w2).
Nevertheless, suppose that some sense can be made of a data string w representing
the raw, unsifted facts about our world as a whole; the point to be made is that
we have actually very little knowledge about the degree of compressibility of that
string. As shown earlier [Eqs. (4) and (5)], there is no sharply defined "expected"
degree of compressibility; all we know empirically is that the data about our world
are somewhat compressible, but not how compressible. Although classical evolution
without chaos allows high compressibility (evolution through N steps, when N is
much larger than any other parameter in the problem, can be compressed into a code
of log N length), in the presence of chaos such efficiency does not obtain. We already
mentioned that in the quantal description there are two types of projections which
enter into the world's history: quasi-classical projections and amplified quantum
fluctuations. We can compress the description of the first type in the absence of
chaos, but how much chaos and other instabilities enhance the role of the second
type in our history is not known. Therefore, in our opinion, it may be premature
to ask: "Why is the world compressible?" (because we do not know if the degree of
compressibility will turn out to be truly remarkable); it may be better to first ask
a different question: "How complex is our world?" This is the same as asking for
an estimate of the algorithmic information in our universe. Today the cosmologists
have an estimate for the total entropy in the universe, but apparently not even a
rough idea about its algorithmic entropy.
In pointing out the ambiguity in compressibility, we do not mean to deny that
the existence of universal laws is remarkable by itself. If we look at only selective
aspects of the world and concern ourselves with only certain regularities, the econ-
omy of the laws is truly impressive. What we have argued is that this economy is
not suitably expressed as the brevity of the minimal program for our specific world;
however, it is possible that the economy of laws is related to the brevity of the core
of the program. The notion of a core can be illustrated with a particular universal
computer studied by Chaitin.1 It reads the first portion of a program and interprets
it as a symbolic expression defining a function in a "pure version" of LISP, and the
remainder of the program serves as the "input data." In this case the S-expression
is what we call the core of the program. The probability that the first k bits of a
random string happens to be a valid S-expression decreases as Ck-3/2 (Chaitin,1
appendix B); hence, it favors the occurrence of short S-expressions (in contrast,
the probability that the first n bits happens to be a valid program decreases only
slightly faster than cn-1). If one wants a machine-independent core length, one
could define it for other machines to be the same as for this computer through
simulation, so that the machine dependence enters only into the program length
and not the core length. In view of the permissive semantics of this "pure LISP,"
it is plausible that the higher probability for the occurrence of short S-expressions
implies also a high probability that even long minimal programs have short cores,
but we have not been able to prove it.
As this last section is more speculative than the previous ones, we summarize
the main points of the earlier sections. The a priori probabilities of algorithmic
information theory show how a certain amount of order can automatically emerge
from random inputs. These probabilities are relevant to complex physical systems
which have computational universality and which are influenced by random fluc-
tuations in the formative stage. For a system meeting the relevancy condition, the
implication for model building is that even a model which accounts for its particular
individual features can be much more economical than naively expected, because
out of the numerous fluctuations that it is exposed to, only a fraction is effective
in giving rise to robust features.
134 C. H. Woo
APPENDIX: EQUATIONS (4) AND (5)

Consider instructions of the form (z; n), where z is a finite-length instruction which
results in the universal computer printing out a potentially infinitely long input,
but the positive integer n specifies that the output should actually halt after n bits.
A subset of such inputs have n = 2m, and the output strings w corresponding to
different values of m satisfy, when m is large,
/(w) < I (m) + 0(1) < log(m) + 2 log log(m) . (Al)
Hence,
log n 1
E( )>A+BE = oo . (A2)
I — m>m loe(m)
Eq. (5) follows from the fact that, with I < n, the number of strings with length
n is < 2n-/(n)+0(1) and, hence, the number of such strings with I between n and
/m. = n + I(n) + 0(1) is greater than or equal to 2n(1— 2-/(n)+0(1)). Here n in
the argument of I stands for the string which is the binary representation of the
number n. Then
Ene2-/(n)+0(i) 0 _ 2-/(n)+0(1))
n1-c ) >
E (— - (A3)
The sum over n will be done by first summing over all numbers of a given length m,
and then summing over m. There is only a negligible fraction of the 2m_ 1 strings in
the first sum for which 2-An) is order 1, and the first sum is greater than or equal
to C2E'n/m2; so the sum over m diverges.
ACKNOWLEDGMENTS
These topics were discussed at different times with Seth Lloyd, Ted Jacobson, and
Jim Hartle, and I thank them for helpful comments. I thank Charles Bennett for a
discussion of the possible effect of noises in the life game.
Some results on what were called "cores" in the last section have been ob-
tained by M. Koppel, who used a slightly different formulation. See M. Koppel,
"Structure," in The Universal Turing Machine, A Half-Century Survey, edited by
R. Herken (Oxford University Press, 1988). I thank Charles Bennett for bringing
this reference to my attention.
REFERENCES
1. Chaitin, G. Algorithmic Information Theory. Cambridge: Cambridge Univ.
Press, 1987.
2. Davies, P. C. W. "Why is the Physical World Understandable?" This volume.
3. Fredkin, E., and T. Toffoli. "Conservative Logic." Intl. J. Theor. Phys. 21
(1982):2,9.
4. Griffiths, R. B. "Consistent Histories and the Interpretation of Quantum Me-
chanics." J. Stat. Phys. 36 (1984):219.
5. Gacs, P., and J. Reif. "A Simple Three-Dimensional Real-Time Reliable Cel-
lular Array." J. Comput. Sys. Sci. 36 (1988):125.
6. Gell-Mann, M. "Entropy, Quantum and Classical Information, and Complex-
ity in the Universe." Report given at this workshop on collaborative work
with J. Hartle and V. Telegdi, 1989.
7. Kolmogoroff, A. N. "Three Approaches to the Quantitative Definition of In-
formation." Prob. Info. Trans. 1 (1965):1.
8. Levin, L. A. "Various Measures of Complexity for Finite Objects." Soy. Math.
DokL 17 (1976):522.
9. Margolus, N. "Physics-Like Models of Computation." Physics 10D (1984):81.
10. Omnes, R. "The Interpretation of Quantum Mechanics." Phys. Lett. A125
(1987):170.
11. Solomonoff, R. J. "A Formal Theory of Inductive Inference." Info. 6 Control
7 (1964):1.
A40 (1989):4731-4751.
Charles H. Bennett
IBM Research, Yorktown Heights NY 10598, USA
How to Define Complexity in Physics, and

Why
Various notions of complexity are listed and discussed. The advantage of

having a definition of complexity that is rigorous and yet in accord with
intuitive notions is that it allows certain complexity-related questions in
statistical physics and the theory of computation to be posed well enough
to be amenable to proof or refutation.
INTRODUCTION
Natural irreversible processes are nowadays thought to have a propensity for self-
organization—the spontaneous generation of complexity (Figure 1). One may at-
tempt to understand the origin of complexity in several ways. One can attempt to
elucidate the actual course of galactic, solar, terrestrial, biological, and even cul-
tural evolution. One can attempt to make progress on epistemological questions
such as the anthropic principle3—the ways in which the complexity of the universe
is conditioned by the existence of sentient observers—and the question often raised
in connection with interpretations of quantum mechanics of what, if any, distinction

138 Charles H. Bennett
science should make between the world that did happen and the possible worlds that
might have happened. One can seek a cosmological "theory of everything" without
which it would seem no truly general theory of natural history can be built. Finally,
at an intermediate level of humility, one can attempt to discover general principles
governing the creation and destruction of complexity in the standard mathematical
models of many-body systems, e.g., stochastic cellular automata such as the Ising
model, and partial differential equations such as those of hydrodynamics or chemi-
cal reaction-diffusion. An important part of this latter endeavor is the formulation
of suitable definitions of complexity: definitions that on the one hand adequately
capture intuitive notions of complexity, and on the other hand are sufficiently ob-
jective and mathematical to prove theorems about. Below we list and comment on
several candidates for a complexity measure in physics, advocating one, "logical
depth," as most suitable for the development of a general theory of complexity in
many-body systems. Further details can be found in Bennett.6
H2O, NH3 , CH4
Much later...
FIGURE 1 What is complexity? What causes it to increase? Is there a limit to its

increase?
How to Define Complexity in Physics, and Why 139
HOW: CANDIDATES FOR A SATISFACTORY FORMAL

MEASURE OF COMPLEXITY
LIFE-LIKE PROPERTIES
Life-like properties (e.g., growth, reproduction, adaptation) are very hard to de-
fine rigorously, and also are too dependent on function, as opposed to structure.
Intuitively, a dead human body is still complex, though it is functionally inert.
THERMODYNAMIC POTENTIALS
Thermodynamic potentials (entropy, free energy) measure a system's capacity for
irreversible change, but do not agree with intuitive notions of complexity. For ex-
ample, a bottle of sterile nutrient solution (Figure 2) has higher free energy, but
lower subjective complexity, than the bacterial culture it would turn into if inoc-
culated with a single bacterium. The rapid growth of bacteria following introduc-
tion of a seed bacterium is a thermodynamically irreversible process analogous to
crystalization of a supersaturated solution following introduction of a seed crystal.
Even without the seed either of these processes is vastly more probable than its
reverse: spontaneous melting of crystal into supersaturated solution, or transfor-
mation of bacteria into high-free-energy nutrient. The unlikelihood of a bottle of
sterile nutrient transforming itself into bacteria is therefore not a manifestation of
the second law, but rather of a putative new "slow growth" law that complexity,
however defined, ought to obey: complexity ought not to increase quickly, except
with low probability, but can increase slowly, e.g., over geological time as suggested
in Figure 1.
COMPUTATIONAL UNIVERSALITY
The ability of a system to be programmed through its initial conditions to simu-
late any digital computation. Computational universality, while it is an eminently
mathematical property, is still too functional to be a good measure of complexity
of physical states: it does not distinguish between a system capable of complex be-
havior and one in which the complex behavior has actually occurred. As a concrete
example, it is known that classical billiard balls,1° moving in a simple periodic po-
tential, can be prepared in an initial condition to perform any computation; but if
such a special initial condition has not been prepared, or if it has been prepared but
the computation has not yet been performed, then the billiard ball configuration
does not deserve to be called complex. Much can be said about the theory of uni-
versal computers; here we note that their existence implies that the input-output
relation of any one of them is a microcosm of all of deductive logic, and in particular
of all axiomatizable physical theories; moreover the existence of efficiently univer-

sal computers, which can simulate other computers with at most additive increase
in program size and typically polynomial increase in execution time, allows the
development of nearly machine-independent (and thus authoritative and absolute)
theories of algorithmic information and computational time/space complexity.
Second Law
allows, but
"slow growth
law" forbids
Second it to happen
Law quickly
Forbids
FIGURE 2 Complexity is not a thermodynamic potential like free energy. The second
law allows a bottle of sterile nutrient solution (high free energy, low complexity) to turn
into a bottle of bacteria (lower free energy, higher complexity), but a putative "slow
growth law' forbids this to happen quickly, except with low probability.
COMPUTATIONAL TIME/SPACE COMPLEXITY

Computational time/space complexity is the asymptotic difficulty (e.g., polynomial
vs. exponential time in the length of its argument) of computing a function.13 By
diagonal methods analogous to those used to show the existence of uncomputable
functions, one can construct arbitrarily hard-to-compute computable functions. It
is not immediately evident how a measure of the complexity of functions can be
applied to states of physical models.
ALGORITHMIC INFORMATION
Algorithmic Information (also called Algorithmic Entropy or Solomonoff-
Kolmogorov-Chaitin Complexity) is the size in bits of the most concise univer-
sal computer program to generate the object in question.1,8,9,14,19,20 Algorithmic
entropy is closely related to statistically defined entropy, the statistical entropy of
an ensemble being, for any concisely describable ensemble, very nearly equal to the
ensemble average of the algorithmic entropy of its members; but for this reason al-
gorithmic entropy corresponds intuitively to randomness rather than to complexity.
Just as the intuitively complex human body is intermediate in entropy between a
crystal and a gas, so an intuitively complex genome or literary text is intermediate
in algorithmic entropy between a random sequence and a perfectly orderly one.
LONG-RANGE ORDER
Long-Range Order, the existence of statistical correlations between arbitrarily re-
mote parts of a body, is an unsatisfactory complexity measure, because it is present
in such intuitively simple objects such as perfect crystals.
LONG-RANGE MUTUAL INFORMATION

Long-Range Mutual Information (Remote Non-Additive Entropy): is the amount
by which the joint entropy of two remote parts of a body exceeds the sum of
their individual entropies (Figure 3). In a body with long-range order it measures
the amount, rather than the range, of correlations. Remote mutual information
arises for rather different reasons in equilibrium and non-equilibrium systems, and
much more of it is typically present in the latter.6 In equilibrium systems, remote
non-additivity of the entropy is at most a few dozen bits and is associated with the
order parameters, e.g., magnetic or crystalline order in a solid. Correlations between
remote parts of such a body are propagated via intervening portions of the body
sharing the same value of the order parameter. By contrast, in nonequilibrium
systems, much larger amounts of non-additive entropy may be present, and the
correlations need not be propagated via the intervening medium. Thus the contents
of two newspaper dispensers in the same city is typically highly correlated, but
this correlation is not mediated by the state of the intervening air (except for
weather news). Rather it reflects each newspaper's descent from a common causal
origin in the past. Similar correlations exist between genomes and organisms in
the biosphere, reflecting the shared frozen accidents of evolution. This sort of long-
range mutual information, not mediated by the intervening medium, is an attractive
complexity measure in many respects, but it fails to obey the putative slow-growth
law mentioned above: quite trivial processes of randomization and redistribution,
for example smashing a piece of glass and stirring up the pieces, or replicating
and stirring a batch of random meaningless DNA, generate enormous amounts of
remote non-additive entropy very quickly.
LOGICAL DEPTH
Logical Depth = Execution time required to generate the object in question by a
near-incompressible universal computer program, i.e., one not itself computable as
output of a significantly more concise program. Logical depth computerizes the Oc-
cam's razor paradigm, with programs representing hypotheses, outputs representing
phenomena, and considers a hypothesis plausible only if it cannot be reduced to a
simpler (more concise) hypothesis. Logically deep objects, in other words, contain
internal evidence of having been the result of a long computation or slow-to-simulate
dynamical process and could not plausibly have originated otherwise. Logical depth
satisfies the slow-growth law by construction.
THERMODYNAMIC DEPTH
The amount of entropy produced during a state's actual evolution has been pro- .
posed as a measure of complexity by Lloyd and Pagels.16 Thermodynamic depth
can be very system-dependent: some systems arrive at very trivial states through
much dissipation; others at very nontrivial states with little dissipation.
SELF-SIMILAR STRUCTURES AND CHAOTIC DYNAMICS

Self-similar structures are striking to look at, and some intuitively complex entities
are self-similar or at least hierarchical in structure or function; but others are not.
Moreover, some self-similar structures are rapidly computable, e.g., by determinis-
tic cellular automaton rules. With regard to chaotic dynamics, Wolfram's distin-
guished between "homoplectic" processes which generate macroscopically random
behavior by amplifing the noise in their initial and boundary conditions, and a more
conjectural "autoplectic" type of processes which would generate macroscopically
pseudorandom behavior autonomously in the absence of noise, and in the presence
of noise, would persist in reproducing the same pseudorandom sequence despite the
noise. Such a noise-resistant process would have the possibility of evolving toward a
deep state, containing internal evidence of a long history. A homoplectic processes,
on the other hand, should produce only shallow states, containing evidence of that
portion of the history recent enough not to have been swamped by dynamically
amplified environmental noise.
Equilibrium Crystal
Region 1 Region 2
Nonequilibrium Baterial Genomes
Ste « sl + S2
FIGURE 3 Remote Non-Additive Entropy. (a) Entropy of remote parts of an equilibrium

crystal is non-additive by a few dozen bits due to correlations mediated by the order
parameters of the intervening medium. (b) Entropy of two bacteria is non-additive by
many thousands of bits due not to the intervening medium but to frozen accidents of a
common evolutionary history.
WHY: USEFULNESS OF A FORMAL MEASURE OF

COMPLEXITY
Aside from their non-specific usefulness in clarifying intuition, formal measures of
complexity such as logical depth, as well as measures of randomness and correla-
tion (e.g., algorithmic entropy and remote mutual information) raise a number of
potentially decidable issues in statistical physics and the theory of computation.
THEORY OF COMPUTATION
The conjectured inequality of the complexity classes P and PSPACE is a necessary
condition, and the stronger conjecture of the existence of "one-way" functions7,15 is
a sufficient condition, for certain very idealized physical models (e.g., billiard balls)
to generate logical depth efficiently.
COMPUTATIONALLY UNIVERSAL MODEL SYSTEMS

Which model systems in statistical mechanics are computationally universal? The
billiard-ball model, consisting of hard spheres colliding with one another and with
a periodic array of fixed mirrors in two dimensions, is computationally universal on
a dynamically unstable set of trajectories measure zero. In this model, the number
of degrees of freedom is proportional to the space requirement of the computation,
since each billiard ball encodes one bit. Probably the mirrors could be replaced
by a periodic wind of additional balls, moving in the third dimension so that one
"wind" ball crosses the plane of computation at time and location of each potential
mirror collision, and transfers the same momentum as the mirror would have done.
This mirrorless model would have a number of degrees of freedom proportional to
the time-space product of the computation being simulated. One might also ask
whether a dynamical system with a fixed number of degrees of freedom, perhaps
some version of the three-body problem, might be computationally universal. Such
a model, if it exists, would not be expected to remain computationally universal in
the presence of noise.
ERROR-CORRECTING COMPUTATION
What collective phenomena suffice to allow error-correcting computation and/or
the generation of complexity to proceed despite the locally destructive effects of
noise? In particular, how does dissipation favor the generation and maintenance of
complexity in noisy systems?
n Dissipation allows error-correction, a many-to-one mapping in phase space.
n Dissipative systems are exempt from the Gibbs phase rule. In typical
d-dimensional equilibrium systems with short-ranged interactions, barring sym-
metries or accidental degeneracy of parameters such as occurs on a coexistence
line, there is a unique thermodynamic phase of lowest free energy.4 This ren-
ders equilibrium systems ergodic and unable to store information reliably in
the presence of "hostile" (i.e., symmetry-breaking) noise. Analogous dissipative
systems, because they have no defined free energy in d dimensions, are exempt
from this rule. A (d + 1)-dimensional free energy can be defined, but varying
the parameters of the d-dimensional model does not in general destabilize one
phase relative to another.
n What other properties besides irreversibility does a system need to take ad-
vantage of the exemption from Gibbs phase rule? In general the problem is to
correct erroneous regions, in which the data or computation locally differs from
that originally stored or programmed into the system. These regions, which
may be of any finite size, arise spontaneously due to noise and to subsequent
propagation of errors through the system's normal dynamics. Local majority
voting over a symmetric neighborhood, as in the Ising model at low temper-
ature, is insufficient to suppress islands when the noise favors their growth.
Instead of true stability, one has a metastable situation in which small islands
are suppressed by surface tension, but large islands grow. Two methods are
known for achieving absolute stability in the presence of symmetry-breaking
noise.
Anisotropic Voting Rules4,12,17 in two or more dimensions contrive to shrink
arbitrarily large islands by differential motion of their boundaries. The rule is such
that any island, while it may grow in some directions, shrinks in others; eventually
the island becomes surrounded by shrinking facets only and disappears (Figure 4).
The requisite anisotropy need not be present initially, but may arise through spon-
taneous symmetry breaking.
Hierarchical Voting Rules." These complex rules, in one or more dimensions,
correct errors by a programmed hierarchy of blockwise majority voting. The com-
plexity arises from the need of the rule to maintain the hierarchical structure, which
exists only in software.
SELF-ORGANIZATION
Is "self-organization," the spontaneous increase of complexity, an asymptotically
qualitative phenomenon like phase transitions? In other words, are there reason-
able models whose complexity, starting from a simple uniform initial state, not
only spontaneously increases, but does so without bound in the limit of infinite
space and time? Adopting logical depth as the criterion of complexity, this would
mean that for arbitrarily large times t most parts of the system at time t would
a d
b e
c f
•
•
•
•
• • •
•
• • • • • •
•
• • • • • • •
• • • • •
• • • • • • • •
• • • • • • •
:
• •
a •
D
FIGURE 4 Anisotropic Voting Rules stabilize information against symmetry-breaking
noise. It is not difficult to find irreversible voting models in which the growth velocity
of a phase changes sign depending on boundary orientation (this is impossible in
reversible models, where growth must always favor the phase of lowest bulk free
energy). Here we show the fate of islands in an irreversible two-phase system in
which growth favors one phase (stippled) at diagonal boundaries and the other phase
(clear) at rectilinear boundaries. (a-c) An island of the dear phase becomes square and
disappears. Similarly (d-f) an island of the stippled phase becomes diamond-shaped
and disappears. Small perturbations of the noise perturb the boundary velocities slightly
but leave the system still able to suppress arbitrarily large fluctuations of either phase.
contain structures that could not plausibly have been generated in time much less
than 2. A positive answer to this question would not explain the history of our
finite world, but would suggest that its quantitative complexity can be legitimately
viewed as an approximation to a well-defined property of infinite systems. On the
other hand, a negative answer would suggest that our world should be compared to
chemical reaction-diffusion systems that self-organize on a macroscopic but finite
scale, or to hydrodynamic systems that self-organize on a scale determined by their
boundary conditions, and that the observed complexity of our world may not be
"spontaneous" but rather heavily conditioned by the anthropic requirement that it
produce observers.
EQUILIBRIUM SYSTEMS
Which equilibrium systems (e.g., spin glasses, quasicrystals) have computationally
complex ground states?
DISSIPATIVE PROCESSES
Do dissipative processes such as turbulence, that are not explicitly genetic or com-
putational, still generate large amounts of remote non-additive entropy? Do they
generate logical depth? Does a waterfall contain objective evidence, maintained de-
spite environmental noise, of a nontrivial dynamical history leading to its present
state, or is there no objective difference between a day-old waterfall and a year-
old one? See Ahlers and Walden1 for evidence of fairly long-term pseudorandom
behavior near the onset of convective turbulence.
ACKNOWLEDGEMENTS
Many of the ideas in this paper were shaped in years of discussions with Gregory
Chaitin, Rolf Landauer, Peter Gacs, Geoff Grinstein, and Joel Lebowitz.
REFERENCES
1. Ahlers, G., and R. W. Walden. "Turbulence near Onset of Convection." Phys.
Rev. Lett. 44 (1980):445.
2. Barrow, J. D., and F. J. Tipler. The Anthropic Cosmological Principle. Ox-
ford: Oxford University Press, 1986.
3. Bennett, Charles H. "The Thermodynamics of Computation-a Review."

Intl. J. Theor. Phys. 21 (1982):905-940.
4. Bennett, C. IL, and G. Grinstein. "On the Role of Irreversibility in Stabiliz-
ing Complex and Nonergodic Behavior in Locally Interacting Discrete Sys-
tems." Phys. Rev. Lett. 55 (1985):657-660.
geneous, Locally-Interacting Systems." Foundations of Physics 16 (1986):585-
592.
6. Bennett, C. H. "Information, Dissipation, and the Definition of Organiza-
tion." In Emerging Syntheses in Science, edited by David Pines. Reading,
MA: Addison-Wesley, 1987.
7. Bennett, C. H. "Logical Depth and Physical Complexity." In The Universal
Turing Machine: A Half-Century Survey, edited by Rolf Herken. Oxford: Ox-
ford University Press, 1988.
8. Chaitin, G. J. "A Theory of Program Size Formally Identical to Information
Theory." J. Assoc. Comput. Mach. 22 (1975):329-340.
9. Chaitin, G. J. Algorithmic Information Theory. England: Cambridge Univ.
Press, 1987.
10. Fredkin, E., and T. Toffoli. "Conservative Logics." Intl. .1. Theor. Phys. 21
(1982):219.
11. Gacs, P. "Reliable Computation with Cellular Automata." J. Comp. 6 Sys.
Sci. 32 (1986):15-78.
12. Gacs, P., and J. Reif. "A Simple Three-Dimensional Real-Time Reliable Cel-
lular Array." Proc. 17th ACM Symposium on the Theory of Computing
(1985):388-395.
13. Garey, M., and D. Johnson. Computers and Intractability, a Guide to NP
Completeness. San Francisco: W. H. Freeman, 1979.
14. Levin, L. A. "Randomness Conservation Inequalities: Information and In-
dependence in Mathematical Theories." Info. and Control 61 (1984):15-37;
preliminary draft MIT Technical Report MIT/LCS/TR-235 (1980).
15. Levin, L. A. "One-Way Functions and Pseudorandom Generators." ACM
Symposium on Theory of Computing, 1985.
16. Lloyd, L., and H. Pagels. "Complexity as Thermodynamic Depth." Ann.
Phys. 188 (1988):186-213.
17. Toom, A. L. "Multicomponent Systems." In Adv. in Probability, vol. 6, edited.
by R. L. Dobrushin. New York: Dekker, 1980,549-575.
18. Wolfram, S. "Origins of Randomness in Physical Systems." Phys. Rev. Lett
55 (1995):449-452.
19. Zurek, W. H. "Algorithmic Randomness and Physical Entropy." Phys. Rev. A
40 (1989):4731-4751.
20. Zvonkin, A. K., and L. A. Levin. "The Complexity of Finite Objects and the
Development of the Concepts of Information and Randomness by Means of
the Theory of Algorithms." Russ. Math. Surv. 256 (1970):83-124.
III Complexity and Evolution
Stuart A. Kauffman
Department of Biochemistry and Biophysics, School of Medicine, University of Pennsylva-
nia, Philadelphia, PA 19104-6059 and Santa Fe Institute, 1120 Canyon Road, Santa Fe,
NM 87501
Requirements for Evolvability in Complex

Systems: Orderly Dynamics and Frozen
Components
This article discusses the requirements for evolvability in complex systems,

using random Boolean networks as a canonical example. The conditions
for crystalization of orderly behavior in such networks are specified. Most
critical is the emergence of a "frozen component" of the binary variables, in
which some variables are frozen in the active or inactive state. Such frozen
components across a Boolean network leave behind functionally isolated
islands which are not frozen.
Adaptive evolution or learning in such networks via near mutant variants
depends upon the structure of the corresponding "fitness landscape." Such
landscapes may be smooth and single peaked, or highly rugged. Networks
with frozen components tend to adapt on smoother landscapes than those
with no frozen component. In coevolving systems, fitness landscapes them-
selves deform due to coupling between coevolving partners. Conditions for
optimal coevolution may include tuning of landscape structure for the emer-
gence of frozen components among the coadapting entities in the system.

152 Stuart A. Kauffman
INTRODUCTION
The dynamical behavior of complex information processing systems, and how those
behaviors may be improved by natural selection, or other learning or optimizing
processes, are issues of fundamental importance in biology, psychology, economics,
and, not implausibly, in international relations and cultural history. Biological evo-
lution is perhaps the foremost example. No serious scientist doubts that life arose
from non-life as some process of increasingly complex organization of matter and
energy. A billion years later we confront organisms that have evolved from simple
precursors, that unfold in their own intricate ontogenies, that sense their worlds,
categorize the states of those worlds with respect to appropriate responses, and in
their interactions form complex ecologies whose members coadapt more or less suc-
cessfully over ecological and evolutionary time scales. We suppose, probably rightly,
that Mr. Darwin's mechanism, natural selection, has been fundamental to this as-
tonishing story. We are aware that, for evolution to "work," there must be entities
which in some general sense reproduce, but do so with some chance of variation.
That is, there must be heritable variation. Thereafter, Darwin argues, the differ-
ences will lead to differential success, culling out the fitter, leaving behind the less
fit.
But, for at least two reasons, Darwin's insight is only part of the story. First,
in emphasizing the role of natural selection as the Blind Watchmaker, Darwin and
his intellectual heritors have almost come to imply that without selection there
would be no order whatsoever. It is this view which sees evolution as profoundly
historically contingent; a story of the accidental occurrence of useful variations ac-
cumulated by selection's sifting: evolution as the Tinkerer. But second, in telling
us that natural selection would cull the fitter variants, Darwin has implicitly as-
sumed that successive cullings by natural selection would be able to successively
accumulate useful variations. This assumption amounts to presuming what I shall
call evolvability. Its assumption is essential to a view of evolution as a tinkerer which
cobbles together ad hoc but remarkable solutions to design problems. Yet "evolv-
ability" is not itself a self-evident property in complex systems. Therefore, we must
wonder what the construction requirements may be which permit evolvability, and
whether selection itself can achieve such a system.
Consider the familiar example of a standard computer program on a sequential
von Neumann universal Turing machine. If one were to randomly exchange the order
of the instructions in a program, the typical consequence would be catastrophic
change in the computation performed.
Try to formulate the problem of evolving a minimal program to carry out some
specified computation on a universal Turing machine. The idea of a minimal pro-
gram is to encode the program in the shortest possible set of instructions, and per-
haps initial conditions, in order to carry out the desired computation. The length
of such a minimal program would define the algorithmic complexity of the compu-
tation. Ascertainment that a given putative minimal program is actually minimal,
however, cannot in general be carried out. Ignore for the moment the problem of
Requirements for Evolvability in Complex Systems 153
ascertainment, and consider the following: Is the minimal program itself likely to
be evolvable? That is, does one imagine that a sequence of minimal alterations in
highly compact computer codes could lead from a code which did not carry out the
desired computation to one which did?
I do not know the answer; nevertheless, it is instructive to characterize the
obstacles. Doing so helps define what one might mean by "evolvability." In order
to evolve across the space of programs to achieve a given compact code to carry
out a specified computation, we must first be able to ascertain that any given
program actually carries out the desired computation. Think of the computation
as the "phenotype," and the program as the "genotype." For many programs, it
is well known that there is no short cut to "seeing the computation" carried out
beyond running the program and observing what it "does." That is, in general,
given a program, we do not know what computation it will perform by any shorter
process than observing its "phenotype." Thus, to evolve our desired program, we
must have a process which allows candidate programs to exhibit their phenotypes,
then a process which chooses variant programs and "evolves" towards the target
minimal compact program across some defined program space. Since programs, and
if need be their input data, can be represented as binary strings, we can represent
the space of programs in some high-dimensional Boolean hyperspace. Each vertex
is then a binary string, and evolution occurs across this space to or toward the
desired minimal target program.
Immediately we find two problems. First, can we define a "figure of merit"
which characterizes the computation carried out by an arbitrary program—defines
its phenotype—which can be used to compare how "close" the phenotype of the
current program is to that of the desired target program. This requirement is impor-
tant since, if we wish to evolve from an arbitrary program to one which computes
our desired function, we need to know if alterations in the initial program bring
the program closer or further from the desired target program. The distribution of
this figure of merit, or to a biologist, "fitness" across the space of programs defines
the "fitness landscape" governing the evolutionary search process. Such a fitness
landscape may be smooth and single peaked, with the peak corresponding to the
desired minimal target program, or may be very rugged and multipeaked. In the
latter case, typical of complex combinatorial optimization problems, any local evo-
lutionary search process is likely to become trapped on local peaks. In general, in
such tasks, attainment of the global optimum is an NP-complete problem, and an
evolutionary search will not attain the global optimum in reasonable time. Thus, the
second problem with respect to evolvability of programs relates to how rugged and
multipeaked the fitness landscape is. The answers are not known, but the intuition
is clear. The more compact the code becomes, the more violently the computation
carried out by the code changes at each minimal alteration of the code. That is,
long codes may have a variety of internal sources of redundancy which allows small
changes in the code to lead to small changes in the computation. By definition, a
minimal program is devoid of such redundancy. Thus, inefficient redundant codes
may occupy a landscape which is relatively smooth and highly correlated in the
sense that nearby programs have nearly the same fitness by carrying out similar
computations. But as the programs become shorter, small changes in the programs
induce ever more pronounced changes in the phenotypes. That is, the landscapes
become ever more rugged and uncorrelated. In the limit where fitness landscapes
are entirely uncorrelated, such that the fitness of "1-mutant" neighbors in the space
are random with respect to one another, it is obvious that the fitness of a neighbor
carries no information about which directions are good directions to move across
the space in an evolutionary search for global, or at least good, optima. Evolution
across fully uncorrelated landscapes amounts to an entirely random search pro-
cess where the landscape itself provides no information about where to search:1°
In short, since minimal programs almost surely "live on" fully uncorrelated land-
scapes in program space, one comes strongly to suspect that minimal programs are
not themselves evolvable.
Analysis of the conditions of evolvability, therefore, requires understanding:
1) What kinds of systems "live on" what kinds of "fitness landscapes"; 2) what
kinds of fitness landscapes are "optimal" for adaptive evolution; and 3) whether
there may be selective or other adaptive processes in complex systems which might
"tune" 1) and 2) to achieve systems which are able to evolve well.
Organisms are the paradigm examples of complex systems which patently have
evolved, hence now do fulfill the requirements of evolvability. Despite our fascina-
tion with sequential algorithms, organisms are more adequately characterized as
complex parallel-processing dynamical systems. A single example suffices to make
this point. Each cell of a higher metazoan such as a human harbors an identi-
cal, or nearly identical, copy of the same genome. The DNA in each cell specifies
about 100,000 distinct "structural" genes, that is, those which code for a protein
product. Products of some genes regulate the activity of other genes in a complex
regulatory web which I shall call the genomic regulatory network. Different cell
types in an organism, nerve cell, muscle cell, liver hepatocyte, and so forth, differ
from one another because different subsets of genes are active in the different cell
types. Muscle cells synthesize myoglobin, red blood cells contain hemoglobin. Dur-
ing ontogeny from the zygote, genes act in parallel, synthesizing their products, and
mutually regulating one another's synthetic activities. Cell differentiation, the pro-
duction of diverse cell types from the initial zygote, is an expression of the parallel
processing on the order of 10,000 to 100,000 genes in each cell lineage. Thus the
metaphor of a "developmental program" encoded by the DNA and controlling on-
togeny is more adequately understood as pointing to a parallel-processing genomic
dynamical system whose dynamical behavior unfolds in ontogeny. Understanding
development from the zygote, and the evolution of development, hence the evolv-
ability of ontogeny, requires understanding how such parallel-processing dynamical
systems might give rise to an organism, and be molded by mutation and selection.
Other adaptive features of organisms, ranging from neural networks to the
anti-idiotype network in the immune system, are quite clearly examples of parallel-
processing networks whose dynamical behavior and changes with learning, or with
antigen exposure, constitute the "system" and exhibit its evolvability.
The hint that organisms to be pictured as parallel-processing systems leads me

to focus the remaining discussion on the behavior of such networks and the condi-
tions for their evolvability. Central to this is the question of whether even random,
disordered, parallel-processing networks can exhibit sufficiently ordered behavior to
provide the raw material upon which natural selection might successfully act. This
discussion therefore serves as an introduction to the following topics:
1. What kinds of random, disordered, parallel-processing networks exhibit strongly
self-organized behavior which might play a role in biology and elsewhere?
2. What kinds of "fitness landscapes" do such systems inhabit?
3. What features of landscapes abet adaptive evolution?
4. Might there be selective forces which "tune" the structures of fitness landscapes
by tuning the structure of organisms, and tune the couplings among fitness
landscapes, such that coevolutionary systems of coupled adapting organisms
coevolve "well?"
In the next section I discuss random Boolean networks as models of disor-
dered dynamical systems. We will find that such networks can exhibit powerfully
ordered dynamics. The the third section I discuss why such systems exhibit order.
It is due to the percolation of a "frozen" component across the network. In such a
component, the binary elements fall to fixed active or inactive states. The frozen
component breaks the system into a percolating frozen region, and isolated islands
which continue to change, but cannot communicate with one another. In the fourth
section I discuss the evolvability of such Boolean networks, and show that networks
with frozen components evolve on more correlated landscapes than those without
frozen components. In the fifth section I discuss a new class of coupled spin-glass
models for coevolution where the adaptive moves by one partner deforms the fit-
ness landscapes of its coevolutionary partners. We find that the analogue of frozen
components remerges in this coevolutionary context. In addition, we will find that
selective forces acting on individual partners can lead them to tune the structure
of their own fitness landscapes and couplings to other landscapes to increase their
own sustained fitness, and that these same adaptive moves "tune" the entire coevo-
lutionary system towards an optimal structure where all partners coevolve "well."
Thus, we have a hint that selection itself, may in principle achieve systems which
have optimized evolvability.
DISCRETE DYNAMICAL SYSTEMS: INTRODUCING BOOLEAN

DYNAMICAL NETWORKS
I have now asked what kinds of complex disordered dynamical systems might ex-
hibit sufficient order for selection to have at least a plausible starting place, and
whether such systems adapt on well-correlated landscapes. This is an extremely
large problem which goes to the core of the ways that complex systems must be
constructed such that improvements by accumulation of improved variants by mu-

tation and selection, or any analogue of mutation and selection, can take place. We
will not soon solve so large a problem. Yet we can make substantial progress. The
immediate task is to conceive of a coherent way to approach such a vast question.
In this section I shall try to confine this question into one such coherent approach
by asking what kinds of "discrete" dynamical systems whose variables are limited
to two alternative states, "on" and "off," adapt on well-correlated landscapes.
Switching networks are of central importance in such an effort. I collect the
reasons for this:
1. For many systems, the on/off "Boolean" idealization is either accurate, or the
best idealization of the nonlinear behavior of the components in the system.
2. We are concerned with dynamical systems with hundreds of thousands of cou-
pled variables. These might represent active or inactive genes coupled in a
genetic regulatory cybernetic IletWOrk,28'29'3°'31'32'34'35 the linked cellular and
molecular components of the immune system and idiotype network,27,41 the
interacting polymers in an autocatalytic polymer system,13, 14,37 or the inter-
acting neurons in a neural network 24,51,54 The idealization to on/off switching
elements allows us to actually study such enormously complex systems. The
corresponding problems are often intractable by computer simulations using
continuous equations.
3. We can pose and answer the following question: What are the construction
requirements in very complex switching networks such that they spontaneously
exhibit orderly dynamics by having small attractors?
4. The same properties which assure orderly dynamics, hence spontaneous order,
simultaneously yield systems which adapt on well-correlated fitness landscapes.
5. Having identified these properties in idealized on/off networks, we will find it
easier to begin to identify homologous properties in a wider class of continuous
nonlinear dynamical systems.
POSITIVE COOPERATIVITY, SIGMOIDAL RESPONSE FUNCTIONS AND THE

ON/OFF IDEAUZATION
A short example demonstrates why a Boolean or on/off idealization captures the
major features of many continuous dynamical systems. Many cellular and biochem-
ical processes exhibit a response which changes in an S-shaped, or sigmoidal, curve
as a function of altered levels of a molecular input.52 For example, hemoglobin is a
tetrameric protein. That means that four monomers are united into the functional
hemoglobin molecule. Each monomer of hemoglobin binds oxygen. But the binding
behavior of the four monomers exhibits positive cooperativity. Binding of oxygen by
a first monomer increases the affinity of the remaining monomers for oxygen. This
implies that the amount of oxygen bound by hemoglobin as a function of oxygen
concentration, or tension, increases faster than linearly, at first, as oxygen levels
increase from a base level. But at sufficiently high oxygen concentration all four
monomers have almost always bound an oxygen, thus further increases in oxygen
concentration do not increase the amount bound per hemoglobin molecule. The
response saturates. This means that a graph of bound oxygen concentration as a
function of oxygen tension is S-shaped, or sigmoidal, starting by increasing slowly,
becoming increasingly steep, then passing through an inflection and bending over,
and increasing more slowly again to a maximal asymptote.
Positive cooperativity and ultimate saturation in enzyme systems, cell recep-
tor systems, binding of regulatory molecules to DNA regulatory sites,1,45,52 and
other places are extremely common in biological systems. Consequently, sigmoidal
response functions are common as well.
The vital issue is to realize that even with a "soft" sigmoidal function whose
maximum slope is less than vertical, coupled systems governed by such systems are
properly idealized by on/off systems. It is easy to see intuitively why this might
be so. Consider a sigmoidal function graphed on a plane, and on the same plane
a constant, or proportional response where the output response is equal to the
input, i.e., the slope is 1.0. The sigmoidal function is initially below the proportional
response. Thus a given input leads to even less output. Were that reduced output
fed back as the next input, then the subsequent response would be even less. Over
iterations the response would dwindle to 0. Conversely, the sigmoidal response
becomes steep in its mid-range and crosses above the proportional response. An
input above this critical crossing point leads to a greater than proportional output.
In turn were that output fedback as a next input, the response would be still greater
than that input. Over iterations the response would climb to a maximal response.
That is, feedback of signals through a sigmoidal function tend to sharpen to an
all-or-none response.25,6° This is the basic reason that the "on/off" idealization of
a flip-flop in a computer captures the essence of its behavior.
In summary, logical switching systems capture major features of a homologous
class of nonlinear dynamical systems governed by sigmoidal functions because such
systems tend to sharpen their responses to external values of the variables. The
logical, or switching, networks can then capture the logical skeleton of such contin-
uous systems. However, the logical networks miss detailed features and in particu-
lar typically cannot represent the internal unstable steady states of the continuous
system. Thus Boolean networks are a caricature, but a good one, an idealization
which is very powerful, with which to think about a very broad class of continu-
ous nonlinear systems as well as switching systems in their own right. I stress that
it is now well established that switching systems are good idealizations of many
nonlinear systems.25 But just how broad the class of nonlinear systems which are
"homologous" in a useful sense to switching networks remains a large mathematical
problem.
THE STATE SPACE DYNAMICS OF AUTONOMOUS BOOLEAN NETWORKS

Boolean networks are comprised of binary, "on/off" variables. A network has N
such variables. Each variable is regulated by some of the variables in the network
which serve as its "inputs." The dynamical behavior of each variable, whether it
will be active (1) or inactive (0) at the next moment, is governed by a logical
switching rule, or Boolean function. The Boolean function specifies the activity of
the regulated variable at the next moment for each of the possible combinations of
current activities of the input variables. For example, an element with two inputs
might be active at the next moment if either one or the other or both inputs were
active at the current moment. This is the Boolean "OR" function. Alternatively,
the element might be active at the next moment only if both inputs were active at
the present moment. This is the Boolean "AND" function.
Let K stand for the number of input variables regulating a given binary element.
Since each element can be active or inactive, the number of combinations of states of
the K inputs is just 2K. For each of these combinations, a specific Boolean function
must specify whether the regulated element is active or inactive. Since there are
two choices for each combination of states of the K inputs, the total number of
Boolean functions, F, of K inputs is
F = 2(2K) . (1)
The number of possible Boolean functions increases rapidly as the number of inputs,
K, increases. For K = 2 there are (22)2 = 16 possible Boolean functions. For K = 3
there are 256 such functions. But by K = 4 the number is 216 = 24336, while for
K = 5 the number is 232 = 5.9 x 108. As we shall see, special subclasses of the
possible Boolean functions are important for the emergence of orderly collective
dynamics in large Boolean networks.
An autonomous Boolean network is specified by choosing for each binary ele-
ment which K elements will serve as its regulatory inputs, and assigning to each
binary element one of the possible Boolean functions of K inputs. If the network
has no inputs from "outside" the system, it is considered to be "autonomous." Its
behavior depends upon itself alone.
Figure 1(a) shows a Boolean network with three elements, 1, 2, and 3. Each re-
ceives inputs from the other two. 1 is governed by the AND function, 2 is governed
by the OR function, and 3 is governed by the OR function. The simplest class of
Boolean networks are synchronous. All elements update their activities at the same
moment. To do so each element examines the activities of its K inputs, consults its
Boolean function, and assumes the prescribed next state of activity. This is sum-
marized in Figure 1(b). Here I have rewritten the Boolean rules. Each of the 23
possible combinations of activities of the three elements corresponds to one state
of the entire network. Each state at one moment causes all the elements to assess
(a) (b)
T T+1
2 3 1 1 2 3 1 2 3
0 0 0 o o o o o o
0 I 0 0 o 1 o 1 o
1 0 0 0 1 o o o 1
I I 1 o i l I I I
'Mr 1 0 o o i I
1 o 1 0 I l
I 2 3 1 3 2 I 1 o 0 I I
0 0 0 0 0 0 I I I I I I
0 1 1 0 1 I
I 0 1 I 0 1
1 1 1 I 1 I
'Or 'oil*
(C) (d)
100
0 Of;D state cycle 1 , II)
110 -->00
1—> 000 state cycle 1
T
010
001c---''
.........."010 state cycle 2
W----1
011 --> 101 state cycle 2
100
I
110-0.011-911r) state cycle 3
I state cycle 3
101
FIGURE 1 (a) The wiring diagram in a Boolean network with three binary elements,
1,2,3, each an input to the other two. One element is governed by the Boolean AND
function, the other two by the OR function. (b) The Boolean rules of (a) rewritten
showing for all 23 = 8 states of the Boolean network at time T, the activity assumed by
each element at the next time moment, T + 1. Read from left to right this figure shows,
for each state, its successor state. (c) The state transition graph, or behavior field, of
the autonomous Boolean network in (a) and (b), obtained by showing state transitions
to successor states, (b), as connected by arrows, (c). This system has 3 state cycles.
Two are steady states (000) and (111), the third is a cycle with two states. Note that
(111) is stable to all single Hamming unit perturbations, e.g., to (110), (101), or (011),
while (000) is unstable to all such perturbations. (d) Effects of mutating rule of element
2 from OR to AND. From Origins of Order: Self Organization in Evolution by S. A.
Kauffman. Copyright © 1990 by Oxford University Press, Inc. Reprinted by permission.
the values of their regulatory inputs, and, at a clocked moment, assume the proper
next activity. Thus, at each moment, the system passes from a state to a unique
successor state.
Over a succession of moments the system passes through a succession of states,
called a trajectory. Figure 1(c) shows these successions of transitions.
The first critical feature of autonomous Boolean networks is this: since there
is a finite number of states, the system must eventually reenter a state previously
encountered; thereafter, since the system is deterministic and must always pass
from a state to the same successor state, the system will cycle repeatedly around
this state cycle. These state cycles are the dynamical attractors of the Boolean
network. The set of states flowing into one state cycle or lying on it constitute the
basin of attraction of that state cycle. The length of a state cycle is the number of
states on the cycle, and can range from 1 for a steady state to 2N.
Any such network must have at least one state cycle attractor, but may have
more than one, each draining its own basin of attraction. Further, since each state
drains into only one state cycle, the set of state cycles are the dynamical attractors
of the system, and their basins partition the 2N state space of the system.
The simple Boolean network in Figure 1(a) has three state cycle attractors, 6c.
Each is a discrete alternative recurrent asymptotic pattern of activities of the N
elements in the network. Left to its own, the system eventually settles down to one
of its state cycle attractors and remains there.
The stability of attractors to minimal perturbation may differ. A minimal per-
turbation in a Boolean network consists in transiently "flipping" the activity of an
element to the opposite state. Consider Figure 1(c). The first state cycle is a steady
state, or state cycle of length one, (000) which remains the same over time. Tran-
sient flipping of any element to the active state, e.g., (100) (010) or (001) causes the
system to move to one of the remaining two basins of attraction. Thus the (000)
state cycle attractor is unstable to any perturbations. In contrast, the third state
cycle is also a steady state (111). But it remains in the same basin of attraction for
any single perturbation (011), (101), or (110). Thus this attractor is stable to all
possible minimal perturbations.
A structural perturbation is a permanent "mutation" in the connections or
Boolean rules in the Boolean network. In Figure 1(d) I show the result of mutating
the rule governing element 2 from the OR function to the AND function. As you
can see, this alteration has not changed state cycle (000) or state cycle (111),
but has altered the second state cycle. In addition, state cycle (000) which was
an isolated state now drains a basin of attraction and is stable to all minimal
perturbation, while (111) has become an isolated state and now is unstable to all
minimal perturbations.
To summarize, the following properties of autonomous Boolean networks are of
immediate interest:
1. The number of states around a state cycle is called its length. The length can
range from 1 state for a steady state to 2N states.
2. The number of alternative state cycles. At least one must exist. But a maximum
of 2N might occur. These are the permanent asymptotic alternative behaviors
of the entire system.
3. The sizes of the basins of attraction drained by the state cycle attractors.
4. The stability of attractors to minimal perturbation, flipping any single element
to the opposite activity value.
5. The changes in dynamical attractors and basins of attraction due to mutations
in the connections or Boolean rules. These changes will underlie the character of
the adaptive landscape upon which such Boolean networks evolve by mutation
to the structure and rules of the system.
Boolean networks are discrete dynamical systems. The elements are either ac-
tive or inactive. The major difference between a continuous and a discrete deter-
ministic dynamical system is that two trajectories in a discrete system can merge.
To be concrete, Figure 1(c) shows several instances where more than one state
converge upon the same successor state.
THE NK BOOLEAN NETWORK ENSEMBLE: CONDITIONS FOR ORDERLY

DYNAMICS
In the present part of this section and the next section I summarize the behaviors
of Boolean networks as a function of N, the number of elements in the net, K, the
average number of inputs to each element in the net, and "P" which shall measure
particular biases on the set of all (22)K Boolean functions used in the net.
In order to assess the expected influence of these parameters, I have analyzed
the typical behavior of members of the entire ensemble of Boolean networks specified
by any values of the parameters N, K, and P. The first results I describe allow no
bias in the choice of Boolean functions; hence N and K are the only parameters. I
further simplify and require that each binary element have exactly K inputs.
In order to analyze the typical behavior of Boolean networks with N elements,
each receiving K inputs, it is necessary to sample at random from the ensemble
of all such systems, examine their behaviors, and accumulate statistics. Numerical
simulations to accomplish this, therefore, construct exemplars of the ensemble en-
tirely at random. Thus, the K inputs to each element are chosen at random, then
fixed, and the Boolean function assigned to each element is chosen at random, then
fixed. The resulting network is a specific member of the ensemble of NK networks.
I, therefore, stress that NK Boolean networks are examples of strongly dis-
ordered systems."'8,16,17,18,19,28,29,34,35,36,57,58 Both the connections and Boolean
functions are assigned at random. Were any such network examined, its structure
would be a complex tangle of interactions, or "input wires," between the N compo-
nents, and the rule characterizing the behavior of one element will typically differ
from its neighbors in the network. Such Boolean networks are spiritually similar to
spin glasses, and the NK family of landscapes, described elsewhere44,43 and below.
Here, however, we generate networks with random wiring diagrams, and random
"logic," and ask whether orderly behavior emerges nevertheless. Note that such be-
havior is occurring in a parallel-processing network. All elements compute their next
activities at the same moment. If we find order in random networks, then "random"
parallel networks with random logic has order despite an apparent cacophony of
structure and logic.
MAJOR FEATURES OF THE BEHAVIOR OF RANDOM BOOLEAN

NETWORKS
I report here briefly the behavior of random Boolean networks.
Table 1 summarizes the salient features for the following cases: K = N, K > 5,
K = 2, K = 1.
1. K = N. In these networks, each element receives inputs from all elements.
Hence there is only one "wiring diagram" among the elements. Each element
is assigned at random one of the 2N Boolean functions. In these maximally
disordered systems, the successor to each state is a completely random choice
among the 2N possible states.
Table 1 shows that the lengths of state cycles average 0.5 x 2(N/2), the number
of state cycle attractors averages N/e, state cycles are unstable to almost all
minimal perturbations, and state cycles are all totally disrupted by random
replacement of the Boolean function of a single variable by another Boolean
function.
State cycle lengths of 0.5 x 2(N/ 2) are vast as N increases. For N = 200, the
state cycles average 2100 = 1030 At a microsecond per state transition, it would
require billions of times the history of the Universe to traverse the attractor.
Here is surely a "big" attractor wandering through state space before finally
returning. I will call such attractors, whose length increases exponentially as
N increases, "chaotic." This does not mean that flow "on" the attractor is
divergent, as in the low-dimensional chaos in continuous dynamical systems. A
state cycle is the analogue of a one-dimensional limit cycle.
Because the successor to each state is randomly chosen, each element is equally
likely to assume either activity 1 or 0 at the next moment; hence virtually all
elements "twinkle" on and off around the long attractor.
The number of cycles, hence basins of attraction, however is small, N/e. Thus
a system with 200 elements would have only about 74 alternative asymptotic
patterns of behavior. This is already an interesting intimation of order even in
extremely complex disordered systems. A number of workers have investigated
this class of systems.7,8,28,29,30,31,34,35,36,37,38,64
The expected number of alternative attractors is N/e or less 36,37,38 Hence these
extremely complex, arbitrarily constructed networks have only a few alterna-
tive modes of asymptotic behavior. The stability of such attractors to minimal
perturbations remains low. K = 2 Nets Crystallize Spontaneous Order!
2. K > 5. Networks in this class have an enormous number of alternative connec-

tion patterns among the N elements. As shown in Table 1, the most essential
feature of these systems is that their attractors remain "chaotic"; they increase
in length exponentially as N increases. The exponential rate at which attrac-
tors grow is low for small values of K, and increases to N/2 as K approaches N.
This implies that even for K = 5, state cycle lengths eventually become huge
as N increases. Similarly, along any such attractor, any element "twinkles" on
and off around such an attractor.36,37,38
Random Boolean networks with K = 2 inputs exhibit unexpected and powerful
collective spontaneous order. As shown in Table 1, the expected length of state
cycles is only 2; similarly the number of alternative state cycle attractors is
also N1/2; each state cycle is stable to almost all minimal perturbations; and
mutations deleting elements or altering the logic of single elements only alter
dynamical behavior slightly.5,7,8,28,29,30,31,32,34,35,36,37,38,57,58
Each property warrants wonder. State cycles are only Nrg in length. Therefore a
system of 10,000 binary elements with 210,000 = 103000 alternative states, settles
down and cycles among a mere 100 states. The attractor "boxes" behavior into
a tiny volume 10-2998 of the entire state space. Here, if I may be forgiven
delight, is spontaneous order indeed! At a microsecond per state transition, the
system traverses its attractor in 100 microseconds, rather less than billions of
times the history of the universe.
The number of alternative attractors is only Nig A system with 10,000 elements
and 103°°° combinations of activities of its elements has only 100 alternative
asymptotic attractor patterns of integrated behavior. Ultimately, the system
settles into one of these.
Along these state cycle attractors, many elements are "frozen" into either the
active or inactive value. I return to this fundamental property below. It gov-
erns the correlated features of the adaptive landscapes in these systems. More
critically, this property points to a new principle of collective order.
Another critical feature of random K = 2 networks is that each attractor is
stable to most minimal perturbations. Small state cycles are therefore correlated
with homeostatic return to an attractor after perturbation.
And in addition we will find shortly that most "mutations" only alter attractors
slightly. K = 2 networks adapt on highly correlated landscapes.
The previous properties mean that this class of systems simultaneously exhibit
small attractors, homeostasis, and correlated landscapes abetting adaptation.
Further, these results demonstrate that random parallel-processing networks
exhibit order without yet requiring any selection.
3. K = 1. In these networks, each element has only a single input. The structure of
the network falls apart into separate loops with descendant tails. If the network
connections are assigned at random, then most elements lie on the "tails" and
do not control the dynamical behavior, since their influence "propagates" off
TABLE 1 Properties of Random Boolean Nets for Different Values of K1

State Number of Reachability
Cycle State Cyde Homeostatic Among Cycles
Length Attractors Stability After Perturbation
K=N 0.5 x 2N/ 2 Nle Low High
K>5 0.5 x 2BN - , N [ I°g(T17`1; Low High

(B > 1) a = P(K) —1/2
K=1 Very Long Very Many Low High
K=2 1/7 High Low
1 Column 1: state cycle length is median number of states on a state cycle.

Column 2: number of state cycle attractors in behavior of one net.
(a = PK —1/2, where PK is mean internal homogeneity of all Boolean functions
on K inputs; see text.) Column 3: homeostatic stability refers to tendency to
return to same state cycle after transient reversal of activity of any one element.
Column 4: reachability is the number of other state cycles to which net flows
from each state cycle after all possible minimal perturbations, due to reversing
the activity of one element.
the ends of the tails. On the order of In N(N1/2) of the number of elements
lie on loops. Each separate loop has its own dynamical behavior and cannot
influence the other structurally isolated loops. Thus such a system is structurally
modular. It is comprised by separate isolated subsystems. The overall behavior
of such systems is the product of the behaviors of the isolated systems. As
Table 1 shows, the median lengths of state cycles increase rather slowly as N
increases, the number of attractors increases exponentially as N increases, and
their stability is moderate. There are four Boolean functions of K = 1 input,
"yes," "not," "true," and "false." The last two functions are constantly active,
or inactive. The values in Table 1 assume that only the Boolean functions "yes"
and "not" are utilized in K = 1 networks. When all four functions are allowed,
most isolated loops fall to fixed states, and the dynamical behavior is dominated
by those loops with no "true" or "false" functions assigned to elements of the
loop. Flyvbjerg and Kjaer16 and Jaffee26 have derived detailed results for this
analytically tractable case.
The results summarized here are discussed elSeWhere,28'29'34'35'43 where I in-

terpret the binary elements as genes switching one another on and off, while the
Boolean network models the cybernetic genetic regulatory network underlying on-
togeny and cell differentiation. I interpret a state cycle attractor of recurrent pat-
terns of gene activity as a cell type in the behavioral repertoire of the genomic
regulatory system. Then:
1. The sizes of attractors map into how confined a pattern of gene expression
corresponds to one cell type. The theory correctly predicts that cell types are
confined patterns of gene expression, and that the cell cycle time varies as about
the square root of the DNA content per cell.
2. The number of attractors maps into the number of cell types in an organism.
The theory predicts that the number of cell types in an organism should vary
as about a square root function of the number of genes in an organism. This,
too, is approximately true. Bacteria have two cell types, yeast two to four,
and humans about 255.1 If one assumes humans have on the order of 100,000
genes, then the square root of the expected number of cell types is about 370.
The observed number of cell types as a function of DNA content, or estimated
number of genes, is between a square root and linear fillICti011.28'29'32'43
3. The stability of attractors maps into homeostatic stability of cell types.
4. The number of attractors accessible by perturbing the states of activities of
single genes maps into the number of cell types into which any cell type can
"differentiate." Since the number is small compared to the total number of
cell types in the organism, then ontogeny must be, and is, organized around
branching pathways of differentiation.
5. The overlaps in gene activity patterns on attractors maps to the similarity of cell
types in one organism. The predicted, and actual, differences in gene activities
between two cell types is on the order of 5% to 10%. Thus higher plants have
perhaps 20,000 genes, and two cell types typically differ in the activities of 1000
to 2000 genes.
6. A core of genes in the model systems fall to fixed active or inactive states,
predicting a core of genes which share the same activity patterns among all
cell types of an organism. Such a core, typically comprising 70% or more of the
genes which are transcribed into heterogeneous nuclear RNA, is observed.
7. The alterations of attractors by mutations corresponds to evolution of novel cell
types. Typical mutation in organisms affect the activities of a small fraction of
the other genes. The same limit to the spread of "damage" occurs in Boolean
networks in the canalizing ensemble.
The spontaneous order we have just uncovered in K = 2 networks and their
generalizations underlies a serious hope to account for much of the order seen in
the orderly coordinate behavior of genetic regulatory systems underlying ontogeny
in the absence of selection. "Random" genetic programs can behave with order.
A NEW PRINCIPLE OF ORDER IN DISORDERED BOOLEAN

MODELS SYSTEMS: PERCOLATION OF FROZEN CLUSTERS
BY FORCING STRUCTURES OR BY HOMOGENEITY
CLUSTERS WHERE P > PC
We investigated in the previous section the behavior of randomly constructed dis-
ordered Boolean networks with N binary variables, each regulated by K other
variables. We found that fully random networks with K = 2 spontaneously exhibit
extremely small, stable attractors which adapt on highly correlated landscapes.
What principles of order allow K = 2 networks to exhibit such profound order?
The basic answer appears to be that such networks develop a connected mesh or
frozen core of elements, each frozen either in the 1 or the 0 state. The frozen core
creates percolating walls of constancy which partition the system into a frozen core
and isolated islands of elements which continue to change activities from 1 to 0
to 1. But these islands are functionally isolated from one another. Alterations of
activities in one island cannot propagate to other islands through the frozen core.
The emergence of such a frozen core is a sufficient condition for the emergence of
orderly dynamics in random Boolean networks.
Two related means to form such percolating walls are now established. The
first are called forcing structures.17,18,19,20,30,31,32,34,35,36 The second have, as yet,
no specific name. I propose to call them internal homogeneity clusters.
These two kinds of structure warrant our attention. At present they constitute
the only known means by which orderly dynamics arises in disordered Boolean
networks. Because Boolean networks are the logical skeletons for a wide range of
continuous nonlinear systems, there are good grounds to suppose that the same
principles will account for order in an extremely wide class of systems.
Forcing structures are described next. Consider the Boolean "OR" function.
This function asserts that if either one or the other of the two regulating inputs
is active at a given moment, then the regulated element will be active at the next
moment. Notice that this Boolean function has the following property: if the first
input is currently active, that alone guarantees that the regulated element will be
active at the next moment, regardless of the current activity of the second input.
That is, this Boolean function has the property that the regulated element can be
fully insensitive to variation in the activity of the second input, if the first input is
active.
I will call a canalizing Boolean function that Boolean function that as a property
has at least one input having at least one value, 1 or 0, which suffices to guarantee
that the regulated element assumes a specific value, 1 or 0. OR is an example of
such a function. So is AND, Table 1, since if either the first or the second input
is 0, the regulated locus is guaranteed to be 0 at the next moment. By contrast,
the EXCLUSIVE OR function, in which the regulated locus is active at the next
moment if one or the other but not both inputs are active at the present moment,
is not a canalizing Boolean function. No single state of either input guarantees the
behavior of the regulated element.
Next consider a system of several binary variables, each receiving inputs from
two or three of the other variables, and each active at the next moment if any one
of its inputs is active at the current moment, (see Figure 2). That is, each element
is governed by the OR function on its inputs. As shown in Figure 2, this small
network has feedback loops. Now the consequence of the fact that all elements are
governed by the OR function on their inputs is that if a specific element is cur-
rently in the "1" state, at the next moment all of those elements that it regulates
are guaranteed or FORCED to be in the "1" state. Thus the "1" value is guar-
anteed to propagate from any initially active element in the net, iteratively to all
"descendents" in the net. But the net has loops; thus the guaranteed "1" value
cycles around such a loop. Once the loop has "filled up" with "1" values at each
element, the loop remains in a fixed state with "1" at each element in the loop, and
cannot be perturbed by outside influences of other inputs into the loop. Further the
"fixed" "1" values propagate to all descendents of the feedback loop, fixing them in
the "1" value as well. Such circuits are called forcing loops and descendent forcing
structures.17,18,19,30,31,32,34,35,36 Note that the fixed behavior of such a part of the
OR
A X B
0 0 O
A X 0 t 1
1 0 1
\ 17 1 I i
B (OR)
C (OR )
F ( OR ) \ te----.'.. Z
D (OR)
1, (0R) \ ,-.
L_, (OR)
G \
FIGURE 2 Forcing structure among binary elements governed by the Boolean OR

function. The forcing "1" value propagates down the structure and around the forcing
loop which eventually is 'frozen" into the forced state with "1" values at all elements
around the loop. Loop then radiates fixed forced values downstream. From Origins
of Order: Self Organization in Evolution by S. A. Kauffman. Copyright © 1990 by
Oxford University Press, Inc. Reprinted by permission.
Forcing Structure
XE X
00 0 •
LA
OII A(012) 0 0 0
10I 0 I 0
11I
E(NOT IF) 11F) 10 1
BM C 1 I 0
00 0 C 0E
O10 M--3C(AND) 0 0 1
100 01 1
1I 1 0 0
J---a-D (IF) 1 1 1
DR F
.1 C
00 0
00 1
O1 1
101 R 0 I 0
10 1
111 I t 1
Stable Steady State
FIGURE 3 Forcing structure among binary elements governed by a variety of Boolean

functions. Forced values propagate downstream through the forcing structure and
around forcing loops which eventually fall to a "frozen" forced state. Loop then radiates
fixed forced values downstream into forcing structure. From Origins of Order: Self
Organization in Evolution by S. A. Kauffman. Copyright @ 1990 by Oxford University
Press, Inc. Reprinted by permission.
network provides walls of constancy. No signal can pass through elements once they
are frozen in their forced values.
The limitation to the OR function is here made only to make the picture clear.
In Figure 3 I show a network with a forcing structure in which a 1 state at some
specific elements force a descendant element to be in the 0 state, which in turn
forces its descendent element to be in the 1 state. The key, and defining feature, of
a forcing structure in a Boolean network is that at each point, a single element has a
single state which can force a descendent element to a specific state regardless of the
activities of other inputs. Propagation of such guaranteed, or forced, states occurs
via the forcing connections in the network. For a connection between two regulated
elements to be classed as "forcing," the second element must be governed by a
canalizing Boolean function, and the first element, which is an input to the second
element, must itself directly or indirectly (i.e., via K = 1 input connections) be
governed by a canalizing Boolean function, and the value of the first element which
can be "guaranteed" must be the value of the first element which itself guarantees
the activity of the second element. Clearly a network of elements governed by the
OR function meets these requirements. More generally, they create a transitive
relation such that if A forces B and B forces C, then A indirectly forces C via B.
Guaranteed, or forced, values must propagate down a connected forcing structure.
Large networks of N switching elements, each with K = 2 inputs drawn at
random from among the N, and each assigned at random one of the (22)K Boolean
switching functions on K inputs, are random disordered systems. Nevertheless, they
can exhibit markedly ordered behavior with small attractors, with homeostasis and,
as, we see below, with highly correlated fitness landscapes. The reason for this is
that large forcing structures exist in such networks. The forcing structures form
a large connected interconnected web of components which stretches or percolates
across the entire lletWOrk.18'19'23'30'31'34'35'36'37'38 This web falls to a fixed state,
each element frozen in its forced value and leaves behind functionally isolated islands
of elements which are not part of the forcing structure. These isolated islands are
each an interconnected cluster of elements which communicates internally. But the
island clusters are functionally isolated from one another because signals cannot
pass through the walls of constancy formed by the percolating forcing structure.
The occurrence of such walls of constancy due to the percolation of extended
forcing structures depends upon the character of the switching network, and in
particular on the number of variables which are inputs to each variable, that is,
upon the connectivity of the dynamical system. Large connected forcing structures
form, or "percolate," spontaneously in K = 2 networks because a high proportion
of the 16 possible Boolean functions of K = 2 inputs belong to the special class
of "canalizing Boolean functions." If two elements regulated by canalizing Boolean
functions are coupled, one as the input to the second, then the probability that the
connection is a "forcing connection" is 0.5. This means that in a large network all
of whose elements are regulated by canalizing Boolean functions; on average half of
the connections are forcing connections.
The expected size and structure of the resulting forcing structures is a math-
ematical problem in random graph theory.11,12,22,32,33,36,37,38 Percolation "thresh-
olds" occur in random graphs and determine when large connected webs of elements
will form. Below the threshold such structures do not form, above the threshold they
do. The percolation threshold for the existence of extended forcing structures in a
random Boolean network requires that the ratio of forcing connections to elements
be 1.0 or greater 31'33'36'37'3s Thus in large networks using elements regulated by
canalizing functions on two inputs, half the 2N connections are forcing. Therefore
the ratio of forcing connections to elements, N/N = 1, is high enough that extended
large forcing structures form. More generally, for K = 2 random networks and net-
works with K > 2, but restricted to canalizing functions, such forcing structures
form and literally crystallize a frozen state which induces orderly dynamics in the
entire network.
Because the percolation of a frozen component also accounts for the emergence
of order due to homogeneity clusters discussed just below, I defer for a moment de-
scribing how the frozen component due to either forcing structures or homogeneity
clusters induces orderly dynamics.
PERCOLATION OF "HOMOGENEITY CLUSTERS; P > PC
Low connectivity is a sufficient, but not a necessary condition for orderly behavior in
disordered switching systems. In networks of high connectivity, order emerges with
proper constraints on the class of Boolean switching rules utilized. One sufficient
condition is constraint to the class of canalizing Boolean functions and the perco-
lation of forcing structures across the network. But another sufficient condition for
order exists.
Consider a Boolean function of four input variables. Each input can be on or
off; hence the Boolean function must specify the response of the regulated switching
element for each of the 24 combinations of values of the four inputs. Among the 16
"responses," the 1 or the 0 response might occur equiprobably, or one may occur
far more often than the other. Let P be the fraction of the 2K positions in the
function with a 1 response. If P is well above 0.5, and approaches 1.0, then most
combinations of activities of the four variables lead to a 1 response. The deviation
of P above 0.5 measures the "internal homogeneity" of the Boolean function.
In Figure 4 I show a two-dimensional lattice of points, each of which is an
on/off variable, and each of which is regulated by its four neighboring points. Each
is assigned at random one of the possible Boolean functions on four inputs, subject
to the constraint that the fraction of "1" values in that Boolean function is a
specified percentage, P, P > 0.5.
Derrida and Stauffer,6 Weisbuch and Stauffer,62 and de Arcangelis,3 summa-
rized in Stauffer57,58 and in Weisbuch,63 have studied two-dimensional and three-
dimensional lattices with nearest-neighbor coupling, and found that if P is larger
than a critical value, Pc, then the dynamical behavior of the network breaks up
into a connected "frozen" web of points fixed in the "1" value, and isolated islands
of connected points which are free to oscillate from 0 to 1 to 0, but are functionally
cut off from other such islands by the frozen web.
In contrast, if P is closer to 0.5 than Pc, then such a percolating web of points
fixed in "1" values does not form. Instead small isolated islands of frozen elements
form, and the remaining lattice is a single connected percolating web of elements
which oscillate between 1 and 0 in complex temporal cycles. In this case, transiently
altering the value, 1 or 0, of one point can propagate via neighboring points and
influence the behavior of most of the oscillating elements in the lattice.
These facts lead us to a new idea: The critical value of P, Pc, demarks a kind
of "phase transition" in the behavior of such a dynamical system. For P closer to
8 8 1 1228228228228228228228 1 1 1 1 1 1 1 1 1 1 1 1 1
8 8 1 1 1 1 1228228228228 1 1 1 1 1 1 1 1 1 1 1 1 1
8 8 8456456456228228228228228228 1 1 1 1 1 10 10 10 1 1 1 1
1 8 1 1228228228228228 1 1 1 1 1 1 1 1 10 10 10 1 1 1 1
1 1 1228228228228228228228 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1228228228228228228228 1 1 1 1 1 1 1 1 1 1 4 4
1 1 1 1 1 1 1 1228228228228 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 6 1 1228228228228 1 1 1 4 1 1 1 1 1 1 1 1 1
1 4 1 6 6 6 1 1228228228228228228 4 1 4 1 1 1 1 1 1 1
1 4 1 1 6 6 6228228228228 1 1 1 4 1 4 1 1 1 1 1 1 1
4 4 1 6 6 6 6 6228228228 1 1 1 1 1 1 1 1 1 1 1 4 4
1 4 12 6 6 6 1228228228228 1 1 1 1 1 1 8 8 8 1 1 1 4
220 1 1 1 1 1 1 1 1228228228 1 1 1 1 1 8 8 8 8 1 1220
220220 1 1 1 1 1 1 1228228228228 1 1 1 1 8 8 4 8 1 1 1
220220 1 1 1 1 1 1 1228228 1 1 1 1 1 1 1 1 1 1220110 1
1220110110 1 1 1 1 1228228 1 1 1 1 1 1 1 1 1 20 20110110
1110110110 1 1 4 1 1228 1 1 2 4 1 1 1 1 1 1 20 20110110
110110110110110 1 4 1 1 1 1 1 2 4 1 1 1 1 20 20 20 20 1110
110110110 22 1 1 1 1 1 1 4 4228 1 1 1 20 20 20 20 20 20 20110
110110 1 1 1 1 1 1 1 1 1 1228 1 4 1 20 20 20 20 20 20 20110
110 22 22 22 22 1 1228228 1 1228228 1 4 4 1 1 1 1 4 20 2 22
22 88 22 22 1 1 1 1228 1228228228 1 1 1 1 1 1 1 1 20 2 1
1 88 1 1 1228228228228228228228228 1 1 1 1 1 1 1 4 4 4 1
1 8 1 1228228228228228228228228228 1 1 1 1 1 1 1 1 1 1 1
FIGURE 4 Two-dimensional lattice of sites, each a binary state "spin" which may point
up or down. Each binary variable is coupled to its four neighbors and is governed by a
Boolean function on those four inputs. The number at each site shows the periodicity
of the site on the state cycle attractor. Thus "1" means a site frozen in the active or
inactive state. Note that frozen sites form a frozen duster that percolates across the
lattice. Increasing the measure of internal homogeniety, p, the bias in favor of a "1" or
a "0" response by any single spin, above a critical value, Pc, leads to percolation of a
"frozen" component of spins, fixed in the 1 or 0 state, which spans the lattice leaving
isolated islands of spins free to vary between 0 and 1.
0.5 than Pc, the lattice of on/off variables, or two state "spins," has no percolating
frozen component. For P closer to 1.0 than Pc, the lattice of on/off variables, or
two state "spins," does have a large frozen component which percolates across the
space.
The arguments for the percolation of a frozen component for P > Pc do not
require that the favored value of each on/off "spin" variable in the lattice be 1.
The arguments carry over perfectly if half the on/off variables responds with high
probability, P > Pc, by assuming the 1 value and the other half responds with
P > Pc with the 0 value. In this generalized case, in the frozen web of "spins"
in the lattice, each frozen spin is frozen in its more probable value, 1 or 0. Thus,
for arbitrary Boolean lattices, P > Pc provides a criterion which separates two
drastically different behaviors, chaotic versus ordered.
The value of P for which this percolation and freezing out occurs, depends
upon the kind of lattice, and increases as the number of neighbors to each point in
the lattice increases. On a square lattice for K = 4, Pc is 0.28.57,58,59 On a cubic
lattice, each point has six neighbors, and Pc is greater than for square lattices. This
reflects the fact that the fraction of "bonds" in a lattice which must be in a fixed
state for that fixed value to percolate across the lattice, depends upon the number
of neighbors to each point in the lattice.
Let me call such percolating frozen components for P > Pc homogeneity clus-
ters to distinguish them from extended forcing structures. I choose this name be-
cause freezing in this case depends upon the internal homogeneity of the Boolean
functions used in the network. That the two classes of objects are different in gen-
eral is clear: In forcing structures the characteristic feature is that at each point a
single value of an element alone suffices to force one or more descendent elements to
their own forced values. In contrast, homogeneity clusters are more general. Thus,
consider two pairs of elements, Al, A2, B1 and B2. Al and A2 might receive inputs
from both B1 and B2 as well as other elements, while B1 and B2 receive inputs
from Al and A2 as well as other elements. But due to the high internal homogeneity,
P > Pc of the Boolean functions assigned to each, simultaneous .1 values by both
Al and A2 might jointly guarantee that B1 and B2 each be active regardless of the
activities of other inputs to B1 and B2. At the same time, simultaneous 1 values
by both B1 and B2 might jointly guarantee that Al and A2 be active regardless
of the activities of other inputs to Al and A2. Once the four elements are jointly
active, they mutually guarantee their continued activity regardless of the behavior
of other inputs to the four. They form a frozen component. Yet it is not a forcing
component since the activity of two elements, Al and A2, or B1 and B2, must be
jointly assured to guarantee the activity of any single element.
While there appear to be certain differences between forcing structures and
homogeneity clusters, those differences are far less important than the fact that,
at present, the two are the only established means to obtain orderly dynamics in
large, disordered Boolean networks.
Whether percolation of a frozen phase is due to an extended forcing structure
or to a homogeneity cluster due to P > Pc, the implications include these:
1. If a frozen phase does not form:
a. The attractors in such a system are very large, and grow exponentially as
the number of points in the lattice, N, increases. Indeed, the attractors are
so large that the system can be said to behave chaotically.
b. As indicated, a minor alteration in the state of the lattice, say, "flipping"
one element from the 1 to the 0 value at a given instant, propagates al-
terations in behavior throughout the system. More precisely, consider two
identical lattices which differ only in the value of one "spin" at a moment,
T. Let each two lattices behave dynamically according to their identical
Boolean rules. Define the "damage" caused by the initial "spin flip" to be
the total number of sites in the lattices which at the succession of time mo-
ments are now induced to be in different states, 1 or 0. Then for P closer to
0.5 than Pc, such damage propagates across the lattice with a finite speed,
and a large fraction of the sites are damaged.6,48,56,57,58,62 Propagation of
"damage" from a single site difference implies that dynamical behavior is

highly sensitive to small changes in initial conditions.
c. Consequently, many perturbations by single flips drive the system to an
entirely different attractor.
d. Damage by "mutations" deleting an element or altering its Boolean func-
tion tends strongly to alter many attractors. Thus such systems adapt on
very rugged fitness landscapes.
2. In contrast, if the ratio of forcing connections to elements is greater than 1.0,
or if internal homogeneity P is closer to 1.0 than Pc:
a. Then a large frozen component and percolating walls of constancy do form,

leaving behind functionally isolated islands which cannot communicate
with one another.
b. The result is that attractors are small, typically increasing as the number of
nodes to some fractional power.28,29,32,34,35,36,37,38,46,47,57,58,62 This means
that the sizes of attractors increase less than linearly as the number of
points in the lattice, N, increases. Such attractors are small indeed, for the
entire state space is the 2N possible combinations of the on/off "spins" in
the lattice. An attractor comprised by less than N states is tiny compared
to 2N. Thus, either the existence of a frozen component due to forcing
structures or due to "homogeneity clusters" for P greater than Pc implies
that such systems spontaneously "box" themselves into very small volumes
of their state space and exhibit high order.
c. Further, damage does not spread. Transient reversal of the state of a "spin"
propagates alterations in behavior only locally if at all 28'29,32'36,37,38,57,58
This means that attractors tend strongly to exhibit homeostatic return after
perturbation.
d. For both frozen components due to forcing structures and homogeneity
clusters, the system is typically not much altered by "mutations" delet-
ing single elements or altering their Boolean rules. Any element which is
itself "buried" in the frozen component cannot propagate alterations to
the remainder of the network. A mutated element within one functionally
isolated island communicates only within that island. Damage does not
spread. Thus such systems adapt on correlated fitness landscapes.
THE NUMBER OF ATTRACTORS ON REGULAR LATTICES

APPEARS TO INCREASE EXPONENTIALLY, BUT LESS THAN
LINEARLY IN "RANDOM NETWORKS"
There may be a critical difference in the number of attractors in random Boolean
networks where each site realizes a randomly chosen Boolean function on its near-
est neighbors in regular lattices and in even more random networks in which both
the inputs to each site and the Boolean function governing that site are randomly
chosen. On a regular lattice in the frozen phase, one would expect the number of
functionally isolated islands to increase in proportion to the volume of the lattice.
Thus, on a square lattice, the number ought to scale the total number of sites. But
the number of attractors of the entire system is given by the product of the number
of alternative modes of each functionally isolated island. If each has, on average,
M > 1 modes, then the number of attractors of the entire system will increase
exponentially as the number of sites, N, increases. Indeed, Stauffer reports analytic
works and numerical work indicating that the probability, Y, that two randomly
chosen initial configurations fall to the same attractor decreases exponentially in
the frozen phase for P > Pc. Hence the number of attractors, presumably inversely
proportional to Y, increases exponentially. In contrast, numerical work and theo-
retical allaiySiS,28'29'34'35'36'37'38 shows that the number of significant attractors in
randomly connected Boolean networks in the frozen phase due to the percolation
of forcing structures, increases only as about N112. Indeed, even in fully disordered
systems, K = N, the number of attractors is only N/e, vastly smaller than expo-
nential in N. The reasons for this apparent difference are not yet clear. Presumably
the regular structure of a lattice compared to the "infinite range"67,58 features of a
randomly coupled set of binary elements, promotes the formation of more isolated
islands within the large percolating homogeneity clusters.
To summarize: the percolation of a frozen component yields disordered Boolean
systems which nevertheless exhibit order. They have small attractors precisely be-
cause a large fractions of the variables remain in fixed states. Furthermore, due to
this frozen component of the lattice, minor modifications of connections, or "bits"
in a Boolean function, or substitution of one for another Boolean function at one
point, or alterations in other parameters will lead to only minor modifications of
dynamical behavior. Thus, such networks have attractors which adapt on correlated
fitness landscapes. This is not surprising; the properties of the system which give
it small attractors, and hence homeostasis, tend to make it insensitive to small al-
terations in the parameters affecting any one part of the system. Selection for one
of these connected set of properties is selection for all. Self-organization for one,
bootstraps all.
THE BALANCE BETWEEN SELF-ORGANIZATION AND SELECTION:

SELECTIVE ADAPTATION OF INTEGRATED BEHAVIOR IN BOOLEAN
NETWORKS
We have now considered which kinds of random, disordered Boolean networks ex-
hibit highly ordered dynamics in the absence of selection or other organizing forces.
Such spontaneous order suggests that many features or organisms might reflect such
self-organization rather than the handiwork of selection. These features include the
number of cell types in an organism, the stability of cell types, the restricted num-
ber of cell types into which each cell type can "differentiate," and, hence, the ex-
istence of branching pathways of differentiation in all multicellular organisms since
the Paleozoic. Nevertheless, natural selection is always at work in actual biology.
Other adaptive processes analogous to selection are at work in economic and cul-
tural systems. Yet we have no body of theory in physics, biology, or elsewhere,
which seeks to understand the ways selection and self-organization may interact.
We have almost no idea of the extent to which selection can, or cannot, modify
the self-organization exhibited in ensembles of systems such as Boolean networks.
For example, if properties of Boolean networks in the canalizing ensemble resemble
those of real organisms, are we to explain those features of real organisms as a con-
sequence of membership in the ensemble of canalizing regulatory systems per se, or
does selection account for the features that we see? Or, more plausibly, both? We
need to develop a theory exploring how selection acts on and modifies systems with
self-ordered properties, and understand the limits upon selection. If selection can-
not avoid those properties of complex systems which are generic to vast ensembles
of systems, then much of what we see in organisms is present, not due to selection,
but despite it. If so, a kind of physics of biology is possible.
Let me be clear about the question that I want to ask. Boolean networks ex-
hibit a wide range of properties. We want to investigate whether adaptive evolution
can attain Boolean networks with some desired property. More generally, we won-
der how the structure of Boolean networks governs the structure of their fitness
landscapes for any such property, how the structure of such landscapes governs the
capacity of evolutionary search to evolve across the space of networks to those with
desired properties, and whether selection can markedly change the properties of net-
works from those generic to the ensemble in which evolution is occurring. Among
the obvious properties of parallel-processing Boolean networks, the attractors of
such systems commend themselves to our attention. A central question therefore is
whether an adaptive process which is constrained to pass via fitter 1-mutant or "few
mutant" variants of a network, by altering the input connections between elements
in a net, and the logic governing individual elements in the net, can "hill climb" to
networks with desired attractors.
Notice that, as in the space of sequential computer programs, we confront a
space of systems. Here the space is the space of NK Boolean networks. Each network
is a 1-mutant neighbor of all those networks which differ from it by altering a single
connection, or a single Boolean function. More precisely, each network is a 1-mutant
neighbor of all those which alter the beginning or end of a single "input" connection,
or a single "bit" in a single Boolean function.
In considering program space I defined a fitness landscape as the distribution
over the space of the figure of merit, consisting in a measurable property of those
programs. This leads us to examine the statistical features of such fitness landscapes,
including its correlation structure, the numbers of local optima, the lengths of walks
to optima via fitter 1-mutant variants, the number of optima accessible from any
point, and so forth. Similarly, in considering adaptation in Boolean network space,
any specific measurable property of such networks yields a fitness landscape over
the space of systems. Again we can ask what the structure of such landscapes looks
like.
I shall choose to define the fitness of a Boolean network in terms of a steady
target pattern of activity and inactivity among the N elements of the network. This
target is the (arbitrary) goal of adaptation. Any network has a finite number of
state cycle attractors. I shall define the fitness of any specific network by the match
of the target pittern to the closest state on any of the net's state cycles. A perfect
match yields a normalized fitness of 1.0. More generally, the fitness is the fraction
of the N which match the target pattern.
In previous work, Kauffman and Levin" studied adaptive evolution on fully
uncorrelated landscapes. More recently,44,43 my colleagues and I introduced and
discussed a spin-glass-like family of rugged landscapes called the NK model. In this
family, each site, or spin, in a system of N sites, makes a fitness contribution which
depends upon that site, and upon K other randomly chosen sites. Each site has
two alternative states, 1 or 0. The fitness contribution of each site is assigned at
random from the uniform distribution between 0.0 and 1.0 for each combination of
the 2K+1 states of the K + 1 sites which bear on that site. The fitness of a given
configuration of N site values, (110100010), is defined as the mean of the fitness
contributions of each of the sites. Thus, this model is a kind of K-spin spin glass,
in which an analogue of the energy of each spin configuration depends, at each
site, on interactions with K other sites. In this model, when K = 0, the landscape
has a single peak, the global optimum, and the landscape is smoothly correlated.
When K is N —1, each site interacts with all sites, and the fitness landscape is fully
random. This limit corresponds to Derrida's random-energy spin glass mode1.4'5'6
Two major regimes exist, K proportional to N, and K of order 1. In the former,
landscapes are extremely rugged, and local optima fall toward the mean of the space
as N increases. In the latter, there are many optima, but they do not fall toward
the mean of the space as N increases. For K = 2, the highest optima cluster near
one another.
Such rugged landscapes exhibit a number of general properties. Among them,
there is a "universal law" for long jump adaptation. In "long jump" adaptation
members of an adapting population can mitate a large number of genes at once,
hence jump a long way across the landscape at once. Frame-shift mutations are
examples. In long jump adaptation the waiting time to find fitter variants doubles
after each fitter variant is found, hence the mean number of improvement steps,
S, grows as the logarithm base 2 of the number of generations. Further, there is
a complexity catastrophe during adaptation via fitter 1-mutant variants on suffi-

ciently rugged landscapes which leads on those landscapes to an inexorable decrease
in the fitness of attainable optima as the complexity of the entities under selection
increases. A similar complexity catastrophe applies to an even wider class of rugged
landscapes in the long jump limit. There the fitness attained after a fixed number
of generations dwindles as complexity increases. Finally some landscapes, namely
those in which the number of epistatic interactions, K, remained small, retained
high optima as N increases. These landscapes have "good" correlation structures.
Together these properties identify twin limits to selection as complexity increases.
In smooth landscapes, as N increases, the fitness differentials between 1-mutant
neighbors dwindle below critical values and selection cannot overcome mutation.
In short, beyond a critical complexity, a mutation error catastrophe sets in. Se-
lection cannot hold an adapting population at fitness peaks, and the population
falls inexorably to lower fitness values and typically "melts" from the peak into
vast reaches of the space of systems. Conversely, in very rugged landscapes, the
complexity catastrophe sets in. As systems become more complex, the conflicting
constraints implied by high K, leads to ever poorer local optima and trapping in
small regions of the highly rugged landscape.
Here we are dealing with adaptation in the coordinated dynamical behavior of
Boolean networks. It is not obvious that the same generic features and limitations
will be found. But they are.
LONG JUMP ADAPTATION IN K = 2 NETWORKS CONFIRMS THE "UNIVER-

SAL LAW"
Figures 5(a) and (b) show numerical results of "long jump" adaptation in Boolean
nets with N = 100 binary elements, each receiving K = 2 inputs. The population
consists of 20 networks, located "at" the current fittest network found in the adap-
tive search process. At each "generation" each of 20 networks mutated 25% of the
"bits" in its N Boolean functions (a), or 50% of the connections were mutated,
(b). The attractors of the grossly mutated nets were tested for their match against
a predefined target pattern. If a fitter net was found, the entire population of 20
"hopped" to that fitter net and searched via long jump mutations from that new
site in network space on the next generation. If no fitter net was found on that
generation, the search repeated with 20 new long jumps from the current best fit
network.
Figures 5(a) and (b) compare the expectation that the cumulative number of
improved variants should increase as log 2 of the number of generations, with the
numerical simulation data. The agreement is extremely close.
The range of applicability of the "Universal law" for long jump adaptation,
closely related to the theory of records,15 is not yet clear, but seems to be broad.
(b)
N.100
C.100
Generation at whichImprovement step occurs
i•
'
• THEORETICAL MEAN
°OBSERVED MEAN
Improvement Step Improvement Step
FIGURE 5 (a) Tests of the "Universal Law" for long jump adaptation. Figures show
cumulative number of improvement steps following mutations of half the connections
in K = 2, N = 50 and N = 100 element Boolean nets in each member of the
population—except for a "current best" place holder—plotted against the logarithm of
the generation at which the improvement occurred. Each walk yields a sequence of
generations at which an improvment step arose. Mean of observed values are plotted,
as well as theoretical expectations. (b) 1/4 of all "bits" in the Boolean functions within
each member of the population of N = 50 or N = 100 networks were reversed at
each generation as a "long jump" mutation in the logic of the network. From Origins
of Order: Self Organization in Evolution by S. A. Kauffman. Copyright © 1990 by
Oxford University Press, Inc. Reprinted by permission.
THE COMPLEXITY CATASTROPHE OCCURS IN LONG JUMP ADAPTATION

The complexity catastrophe occurs not only in NK landscapes, but in long jump
adaptation in Boolean networks. That is, as N increases, long jump adaptation
achieves substantially less fit networks at any fixed generation. To test this, adap-
tation was carried out in the long jump limit in which half the connections among
binary variables in K = 2 input nets were mutated in all members of a population
except one left at the current best network. As N increases from 20 to 100, the
fitness achieved after 100 generations declines from 0.88 to 0.67. The difference is
statistically significant. Thus, as N increases, the fitness after a fixed number of

generations clearly is falling. In principle it falls toward 0.5, the mean fitness of
networks in the space.
This result is of considerable interest. As in the NK landscape family and the
Traveling Salesman problem, in Boolean nets the rate of finding improved variants
depends upon the mutant search range and how well it matches the correlation
structure of the landscape. Often search via fitter 2-mutant variants is better than
via 1-mutant variants. But in the limit of long jumps on rugged landscapes, the rate
of improvement slows to the log 2 law, and the complexity catastrophe sets in. Thus
long jump adaptation is a progressively worse adaptive strategy as the complexity
of Boolean networks, N, increases.
BOOLEAN NETWORK SPACE IS FULL OF LOCAL OPTIMA WHICH TRAP

ADAPTIVE WALKS
Rugged landscapes have many local optima. These trap adaptive walks. In the
current case we are asking whether Boolean networks can adapt via mutation and
selection to have a specific pattern of activities among the N binary elements as a
steady state attractor. Note first that no mathematical constraint foredooms such
an effort. Any network in which each element that is to be "active" in the target
pattern is "active" for all patterns of inputs, while all elements that are inactive in
the target pattern respond by being inactive to all input patterns, fills the bill. The
constantly active rule is the Tautology Boolean function. The constantly inactive
rule is the Contradiction Boolean function. It follows that adaptation by altering
single bits in Boolean functions can in principle achieve such a network.
In Figure 6(a) I show the results of adaptive walks via fitter 1-mutant, 2-mutant,
and 5-mutant variants of Boolean networks. The number of mutants refers to the
number of bits altered in the net's Boolean functions. As before a population of
20 nets is adapting from the best net found in the current generation. Figure 6(b)
shows similar phenomena when 1, 2, or 5 connections in the networks are mutated.
Note the following general features.
First, improvement is rapid at first then slows and typically appears to stop.
Walks have arrested on local optima. The fact that improvement slows shows that
the fraction of fitter mutant neighbors dwindles as optima are approached.
Second, walks always stop at local optima well below the global optimum. Trap-
ping is rife in network space. This has a critical consequence.
ADAPTIVE WALKS TYPICALLY CANNOT ACHIEVE ARBITRARY PATTERNS

OF ACTIVITIES ON ATTRACTORS
Adaptation via fitter variants in network space becomes grossly hindered by the
rugged structure of the landscape. Walks become frozen into small regions of the
space. Any intuition which we may have harbored that mutation and selection
K"2 Fa5
Ka 2 Fs2
u
9 0.8
K"2 Fa I
Ku2 Fa2 I
IT_ 0.7
0.6
0.5
0.4 • • • •
0 10 20 30 40 50 60 70 80 90 100
GENERATION
N a 50 13'50
(b)
1.0
0.9
Kg2 Ca5
0.) Kg2 Ca2

W2
a) 0.8
Cl
,
l-- • Ka2 Cal
LT 0.7
0.6 • e
0.5
0. ' 4 . •
40 10 30 30 40 50 60 70 80 90 100
GENERATION
N"50 P"50
FIGURE 6 (a) Adaptation via fitter 1, 2 and 5 mutant variants in K = 2 networks.

Mutations altered the "bits" within Boolean functions in the networks of the adapting
population. (b) Same as 5a, except that 1, 2 and 5 of the connections were mutated in
networks. From Origins of Order: Self Organization in Evolution by S. A. Kauffman.
Copyright © 1990 by Oxford University Press, Inc. Reprinted by permission.
alone could "tune" attractors to arbitrary patterns of behavior appears to be wrong.

Such problems are very complex combinatorial optimization tasks, and selection
confronts enormous problems moving successfully in such spaces. Generically, one
cannot get there from here. These limitations appear to be very important. In
the introduction to this article, I described a number of examples of networks
which exhibit adaptation. This adaptation often pertains to the attractors of the
parallel-processing networks which are interpreted in biological contexts ranging
from memories in neural networks to cell types in genetic networks. In general,
learning or adaptation is imagined to occur by altering couplings among the network
components to achieve "desired" attractors. These results suggest that this may
typically be extremely difficult, or impossible. If so, then either alternative means
to search rugged adaptive landscapes in network spaces must exist, or adaptation
and learning does not achieve arbitrary attractors. One must come to suspect that
the latter possibility is more plausible.
Third, note that adaptation via 2-mutant and 5-mutant variants is more rapid
and reaches higher optima than adaptation via fitter 1-mutant variants in the same
time. Thus the correlation structure favors search at slightly longer distances.
K = 2 NETS ADAPT ON A MORE CORRELATED LANDSCAPE THAN K = 10

NETS
Our results above show that K = 2 networks have highly orderly global dynamics,
with small stable attractors. Networks in which each element has more inputs per
element, K = 10, have chaotic attractors which increase in length exponentially as
N increases. Mutations cause more drastic alterations in attractors in K = 10 nets;
hence K = 10 nets should adapt on more rugged landscapes than should K = 2
nets. Figures 7(a) and (b) show adaptive walks for K = 10 networks. Because cycle
lengths are long, we studied small networks. The same basic features were found as
for K = 2 nets.
Figure 8(a)-8(d) compares the ruggedness of fitness landscapes in K = 2 and
K = 10 nets. It shows the fitness of the 1-mutant, 2-mutant, and 5-mutant variants
of the best network found after 100 generations of adaptation in K = 2 and K = 10
networks. The salient feature is that in K = 2 nets, the 1-mutant neighbors of the
best net have nearly the same fitness. The landscape is highly correlated. This
impression is confirmed by looking at the 2 and 5 mutant variants. The spread in
fitness only increases slightly. In contrast, for K = 10 nets, the spread in fitness
is wider and increases rapidly as 1-mutant to 5-mutant variants are examined.
Thus K = 10 networks adapt on a very much more rugged landscape than K = 2
networks.
(a)
1.0
09
co)
(s) 0.8
s— K.I0 F=5
II 0 7
K=I0 f.2
06 -
K.10 F.I
0.5
04
0 10 20 30 40 50 60 70 80 90 100
GENERATION
01, 10 P=50
(b)
O
K=I0 C=5
0.9
K.10 C=2
cn 0 8
U)
la
K.10 C=I
0.7
06
05
04
0 10 20 30 40 50 60 70 80 90 100
GENERATION
N.I0 P.50
FIGURE 7 (a) As in Figure 5, except that K = 10. (b) Same as (a), except that the
connections were mutated in networks. From Origins of Order: Self Organization
in Evolution by S. A. Kauffman. Copyright C) 1990 by Oxford University Press, Inc.
Reprinted by permission.
(a)
1203 47 10 0 6 3 1 2 0 2 7 0 0 2 0 o o o 0 0
0.71 0.46
K .2 F=1
1144 23 0 1 15 11 12 0 9 1 1 0 7 2 7 0 0 0 0 01
0.603 0.500
Kg2 Fg2
141 37 II 10 11 13 3 10 15 11 L7 17 10 10 11 3 11 3 3 71
01111 0.511
K22 Fg5
(b)
1203 22 a 0 2 2 3 1 2 6 0 01010010 01
0.700 0.400
If92 C=1
104 34 34 I 16 13 • 1 • 2 0 0 1 3 1 0 0 0 0 01
0.1100 Kg2 Cg2 0.300
146 33 22 20 1 21 22 4 16 11 • 19 2 6 4 3 2 4 2 111
0.600 Kg2 C.5 0.560
(C)
pm 0 0 25 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 01
0.600 0300
Kg10 Fgl
1105 0 0 67 2 0 0 1 0 0 3 0 72 0 0 0 0 0 0 01
0.625 K=10 Fg2 0325
17400000 o oo *goo o *es 000 ol

0.700 K=10 Fg5 0.400
(d)
loge 0 1 6 3 3 35 3 2 3 7 1 5 7 3 3 0 0 0 01
0.700 0.400
Kg10 Cgl
150 0 o 0 0 le 0 0 0 0 0 0 0 2 1 0 2 0 3 OA
0.900 K.10 C.2 0.100
119 0 0 0 0 0 0 0 0 0 S 0 0 5 0 2 2 2 2 201
0500 1Cg10 Ca5 0.600
FIGURE 8 The fitnesses of 1, 2, and 5 mutant variants of the fittest network found
after adaptive hill climbing in K = 2 networks (a-b), or K = 10 networks (c-d). ''F"
refers to mutations to 'bits" in Boolean functions, "C" to mutations of connections. From
Origins of Order: Self Organization in Evolution by S. A. Kauffman. Copyright @
1990 by Oxford University Press, Inc. Reprinted by permission.
TABLE 2 Mean Fitness in K = 2 Nets

Attained after 100 and 200 Generations
for N = 20 and N = 100 Nets'
Generations
N 100 200
20 0.90 0.91
100 0.78 0.79
1 Means are averages of 20 nets.
K = 2 NETS EXHIBIT THE COMPLEXITY CATASTROPHE, BUT SLOWLY

In Table 2 I examine the fitness of optima attained after 100 and 200 generations
of adaptation in K = 2 networks of N = 20 and N = 100. The important result
is that as N increases, the fitness appear to have reached a local optimum after
200 generations, and nevertheless the fitness decreases as N increases. For N = 20
networks at 200 generations mean fitness is .90, while for N = 100 networks at 200
generations mean fitness has fallen to .78.
This means that even though K = 2 networks adapt on well-correlated, "good"
landscapes, they cannot avoid the complexity catastrophe. Presumably, the fitness
attained will ultimately be hardly better than chance, 0.5. On the other hand, com-
parison with long jump adaptation for the same class of K = 2 networks suggests
that in the long jump limit the rate of decrease of fitness as N increases is faster.
Thus adaptation via near neighbors on the correlated K = 2 fitness landscape does
not fall prey to the complexity catastrophe as rapidly as would occur were the
landscape fully uncorrelated.
These results strongly suggest that K = 2 nets adapt on better landscapes than
do K = 10 nets with respect to selection for attractors which match a desired
steady state patter.
A general summary of our results is that the features of adaptive landscapes
found for evolution in sequence spaces,40,44 and in the NK family of landscapes,43
extends to adaptation in the integrated dynamical behavior of Boolean networks.
It was not obvious that the same features would be found, for sequence space and
landscapes over proteins might be very different than fitness landscapes over spaces
of dynamical systems with respect to their attractors. Nevertheless, similar features
are found. Landscapes are rugged and multipeaked. Adaptive processes typically
become trapped on such optima. The "long jump" law obtains. Most importantly as
the complexity of the entities under selection increases, here the number of binary
switching variables in a disordered Boolean network, the attainable optima again
fall toward the mean of the space. We do not know at this stage just how general this
complexity catastrophe limiting the power of selection when operating on complex
systems may be, but it appears likely to be a powerful factor in evolution. Finally,
Boolean networks of different connectivities, K = 2 and K = 10, clearly adapt
on radically different landscapes. The capacity to attain and maintain high fitness
depends upon landscape structure, mutation rate, and coevolutionary couplings of
landscapes. It follows that dynamical systems in different classes, constructed in
different broad ways, can have very different capacities to adapt. Tentatively, it
appears that Boolean nets of low connectivity are likely to adapt more readily than
those of high connectivity.
Among the themes to be investigated in understanding the relation between
self-organization and selection, is the extent to which selection can achieve systems
whose behavior is very untypical of those in the ensemble in which adaptive evolu-
tion is occurring. In the current context, can selection operate on Boolean networks
with K = 20 inputs and N = 10,000, and achieve networks with short stable at-
tractors? The answer is unknown. But since the generic properties of this class of
random Boolean networks includes attractors which scale exponentially in N, and
are grossly unstable to minimal perturbations, one doubts strongly that selection
could achieve such systems within the N = 10,000 K = 20 ensemble. But, if the
structure of such networks governs the sizes and stability of their attractors, it
also governs the ruggedness of the fitness landscapes upon which they evolve. If
selection can "tune" K in such networks, or bias the choice of Boolean functions in
such networks, then selection can change the ensemble being explored by evolution.
Such changes would tune the landscape structure of the systems, hence their evolv-
ability. The fact that the K = 2 and canalizing ensemble fits so many features of
organisms, and that organisms are themselves now clearly evolvable, suggests that
this ensemble may itself have been achieved by selection in part to achieve evolv-
ability. In the next section we turn to ask what features of fitness landscapes and
the couplings between landscapes such that landscapes deform as partners adapt,
abet coevolution.
COEVOLUTIONARY TUNING OF LANDSCAPE STRUCTURE AND

COUPLING TO ACHIEVE SYSTEMS ABLE TO COEVOLVE SUCCESSFULLY:
FROZEN COMPONENTS AND SELF-ORGANIZED CRITICALITY
The results above show that different classes of disordered dynamical systems, dif-
ferent ensembles of Boolean networks, adapt on fitness landscapes of different de-
grees of ruggedness. We now turn to sketch a broad further topic and some initial
insight into it. In real biological evolution, the adaptive evolution of members of
one species occurs in the context of other species. Development of a stickier tongue
by the frog lowers the fitness of the fly, and also alters the fly's fitness landscape.
Coevolution is a story of fitness landscapes which are coupled together such that
moves by one coevolutionary partner causes the fitness landscapes of its partners
to deform more or less drastically. It is a story of coupled dancing landscapes. On a
fixed fitness landscape there is the analogue of a potential function: the fitness at
each point. In coevolution, no such potential function is present. Thus we can frame
the following questions: 1) How are fitness landscapes coupled? 2) What kinds of
couplings between landscapes allows the partners to dance happily and typically
achieve "high fitness?" 3) Might there be evolutionary processes which alter the
couplings among landscapes and the landscape structure of each partner, such that
the entire system coevolves "well" or optimally in some sense?
Answers are not known, of course. I describe briefly some preliminary work
carried out with my colleague Sonke Johnsen using the spin-glass-like NK model of
fitness landscapes. As noted briefly above, the NK model consists of N spins, each
in two states, 1 or 0. Each spin makes a "fitness contribution" to the "organism"
which depends upon the value at that spin site, and at K other randomly chosen
sites. In our coevolutionary model, we consider a system with S organisms, one for
each of S species. Each species interacts with R neighboring species. Each site in
each species makes a fitness contribution which depends upon K sites within that
species member, and "C" sites in each of the R species with which it interacts. The
fitness contribution of each site therefore depends upon K +1+ R* S sites, each of
which can be in the 1 or 0 state. The model assigns to each site a fitness contribution
at random from the uniform interval between 0.0 and 1.0, for each of the 2K+1+R
combinations of these site values. In an extension to the model, each species also
interacts with an external world of N sites, of which W affect each of the species
own sites. Thus, the coevolutionary model is a kind of coupled spin system. Each
species is represented by a collection of N spins. Spins are K coupled within each
species, and C coupled between species. The fitness of any species, whose current
state is given by the values of its N spins, depends upon the states of those spins
and those in its R neighbors which impinge upon it, and is the mean of the fitness
contribution of the species' own N sites.
Consider a "square" 10 x 10 ecosystem with 100 species each of which interacts
with its four neighbors. Corner species interact with only two neighbors, edge species
interact with three neighbors. Each species "plays" in turn by flipping each of its
N spins, one at a time, and ascertaining if any 1-mutant variant is fitter than the
current spin configuration of that species. If so, the species randomly chooses one of
the fitter variants and "moves" there. Each of the 100 players plays in turn, in order.
100 plays constitutes an ecosystem generation. After each ecosystem generation, a
species may have changed spin configuration, or may not have changed. If the
species changed, color it blue. If it remained fixed, color it red. Over time the
system will continue to change unless all members stop changing, and the whole
system becomes frozen in a "red" state. Such a state corresponds to a local Nash
equilibrium in game theory. Each player is at a local (1-mutant) optimum consistent
with the local optima of its R neighbors.
Recall that increasing K increases the ruggedness of these NK landscapes.
We find the following remarkable result: When K is large relative to R * C, then
over ecosystem generations frozen red regions form, grow, and percolate across the
ecosystem. At first these red frozen components leave behind blue islands of species
which continue to undergo coevolutionary change. Eventually, the entire system
becomes frozen in a red Nash equilibrium. In short, frozen components recur on
this larger scale of coupled spin systems, in direct analogy with those found in
Boolean networks. The number of ecosystem generations required for the frozen
component to spread across the ecosystem increases dramatically when K is less
than R * C.
Tuning the parameters of the coupled spin model, N, K, C, S, and the number
of sites which can "flip" or mutate at once in each species, not only tunes the
mean time to reach a Nash equilibrium, but also tunes the mean fitness of the
coevolving partners. While full results are not yet available, it appears than in any
model ecosystem, there is an optimal value of K. When K is too small relative to
R * C, the landscape of each partner is too "smooth," and the effects of altering a
site internal to a species upon its fitness is too small with respect to the impact
of site alterations in other species to withstand those exogenous perturbations to
landscape structure. The waiting time to reach the frozen Nash equilibrium is long,
and sustained fitness is low. Conversely, if K is too high, Nash equilibria are rapidly
attained, but the high K value implies many conflicting constraints; thus the fitness
of the local optima which comprise the Nash are low. Again, sustained fitness is
low. An optimal value of K optimizes the waiting time to find Nash equilibria such
that the sustained fitness is itself optimized.
It is also important that an evolutionary process guided by natural selection
acting on members of individual species may lead partners to "tune" K to the opti-
mum. For each partner in a system where each has a suboptimal or overoptimal K
value, any single partner improves its own sustained fitness by increasing or lowering
its K value toward the optimal value. Thus natural selection, acting on members
of individual species to tune the ruggedness of their own fitness landscapes, may
optimize coevolution for an entire coupled system of interacting adapting species.
Real coevolution confronts not only adaptive moves by coevolving partners, but
exogenous changes in the external "world" impinging upon each partner. The cou-
pled NK landscape model suggests that if each partner is occasionally shocked by
a change in its external world, then sustained fitness may be optimized by increas-
ing K slightly. In this case, the coevolving system as a whole tends to restore the
red frozen Nash equilibria more rapidly in the face of external perturbations which
destabilize the system.
Finally, it has been of interest to study the distribution of coevolutionary
avalanches unleashed by changing the external "world" of species when the entire
system is at a frozen Nash equilibrium. Small and large avalanches of coevolution-
ary change propagate across the system. To a first approximation, when the K
value is optimized to maximize sustained fitness, the distribution of avalanche sizes
appears to be linear in a log-log plot, suggesting a power law distribution. If so,
the self-optimized ecosystem may harbor a self-organized critical state of the kind
recently investigated by Bake in other contexts. Interestingly, the distribution of
such avalanches in these model ecosystems mirrors the distribution of extinction
events in the evolutionary record
These results are first hints that coevolving systems may tune the structure of
their internal landscapes and the coupling between landscapes under the aegis of
natural selection such that the coupled system coadapts well as a whole. No mean
result this, if true.
SUMMARY
What kinds of dynamical systems harbor the capacity to accumulate useful varia-
tions, hence evolve? How do such systems interact with their "worlds" in the sense
of categorizing their worlds, act upon those categorizations, and evolve as their
worlds with other players themselves evolve? No one knows. The following is clear.
Adaptive evolution, whether by mutation and selection, or learning, or otherwise,
occurs on some kind of "fitness landscape." This follows because adaptation or
learning is some kind of local search in a large space of possibilities. Further, in
any coevolutionary context, fitness landscapes deform because they are coupled.
The structure and couplings among landscapes reflect the kinds of entities which
are evolving and their couplings. Natural selection or learning may tune both such
structures and couplings to achieve systems which are evolvable.
A further point is clear. Complex, parallel-processing Boolean networks which
are disordered can exhibit ordered behavior. Such networks are reasonable models
of a large class of nonlinear dynamical systems. The attractors of such networks
are natural objects of interest. In the present article I have interpreted attractors
as "cell types." But equally, consider a Boolean network receiving inputs from an
external world. The attractors of a network are the natural classifications that the
network makes of the external world. Thus, if the world can be in a single state, yet
the network can fall to different attractors, then the network can categorize that
state of the world in alternative ways and respond in alternative ways to a single
fixed state of the external world. Alternatively, if the world can be in alternative
states, yet the network fall to the same attractor, then the network categorizes
the alternative states of the world as identical, and can respond in the same way.
In brief, and inevitably, nonlinear dynamical systems which interact with external
worlds classify and "know" their worlds.
Linking what we have discussed, and guessing ahead, I suspect that if we could
find natural ways to model coevolution among Boolean networks which received in-
puts from one another and external worlds, we would find that such systems tuned
their internal structures and couplings to one another so as to optimize something
like their evolvability. An intuitive bet is that such systems would achieve internal
structures in which the frozen components were nearly melted. Such structures live
on the edge of chaos, in the "liquid" interface suggested by Langton,49 where com-
plex computation can be achieved. In addition, I would bet that couplings among
entities would be tuned such that the red frozen Nash equilibria are tenuously held
to optimize fitness of all coevolving partners in the face of exogenous perturbations
to the coevolving system. But a tenuous frozen component in a coevolutionary con-

text would be a repeat of "the edge of chaos" on this higher level. Perhaps such a
state corresponds to something like Bak's self-organized critical state. It would be
exciting indeed if coadaptation in mutually categorizing dynamical systems tended
to such a definable state, for the same principles might recur on a variety of levels
in biology and beyond.
REFERENCES
1. Alberts, A., D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson. Molec-
ular Biology of the Cell. New York: Garland, 1983.
2. Bak, P., C. Tank, and K. Wiesenfeld. "Self-Organized Criticality." Phy. Rev.
A. 38(1) (1988):364-374.
3. De Arcangelis, L. "Fractal Dimensions in Three-Dimensional Kauffman Cellu-
lar Automata." J. Phys. A. Lett. 20 (1987):L369-L373.
4. Derrida, B., and H. Flyvbjerg. "Multivalley Structure in Kauffman's Model:
Analogy with Spin Glasses." J. Phys. A: Math. Gen. 19 (1986):L1003-L1008.
5. Derrida, B., and Y. Pomeau. "Random Networks of Automata: A Simple An-
nealed Approximation." Biophys. Lett. 1(2) (1986):45-49.
6. Derrida, B., and D. Stauffer. "Phase-Transitions in Two-Dimensional Kauff-
man Cellular Automata." Europhys. Lett. 2(10) (1986):739-745
7. Derrida, B., and H. Flyvbjerg. "The Random Map Model: A Disordered
Model with Deterministic Dynamics." J. Physique 48 (1987):971-978.
8. Derrida, B., and H. Flyvbjerg. "Distribution of Local Magnetizations in Ran-
dom Networks of Automata." J. Phys. A. Lett. 20 (1987):L1107-L1112.
9. Eigen, M. " New Concepts for Dealing With the Evolution of Nucleic Acids."
In Cold Spring Harbor Symposia on Quantitative Biology, vol. LII. Cold
Spring Harbor Laboratory, 1987, 307-320.
10. Eigen, M., and P. Schuster. The Hypervycle, A Principle of Natural Self-
Organization. New York: Springer-Verlag, 1979.
11. Erdos, P., and A. Renyi. On the Random Graphs 1, vol. 6. Debrecar, Hun-
gary: Inst. Math. Univ. Debreceniens, 1959.
12. Erdos, P., and A. Renyi. "On the Evolution of Random Graphs." Math. Inst.
Hung. Acad. Sci., Publ. No. 5, 1960.
13. Farmer, J. D., K. S. Kauffman, and N. H. Packard. "Autocatalytic Replica-
tion of Polymers." Physica 22D (1986):50-67.
14: Farmer, J. D., N. H. Packard, and A. Perelson. "The Immune System, Adap-
tation, and Machine Learning." Physica 22D (1986):187-204.
15. Feller, W. Introduction to Probability Theory and its Applications, vol. II, 2nd
edition. New York: Wiley, 1971.
16. Flyvberg, H., and N. J. Kjaer. "Exact Solution of Kauffman's Model with
Connectivity One." J. Phys. A. 21(7) (1988):1695-1718.
17. Fogelman-Soulie, F. "Frustration and Stability in Random Boolean Net-

works." Discrete Appl. Math 9 (1984):139-156.
18. Fogelman-Soulie, F. Ph.D. Thesis, Universite Scientifique et Medical de
Grenoble, 1985.
19. Fogelman-Soulie, F. "Parallel and Sequential Computation in Boolean Net-
works." Theor. Comp. Sci. 40 1985.
20. Gelfand, A. E., and C. C. Walker. Ensemble Modeling. New York: Dekker,
1984.
21. Glass, L., and S. A. Kauffman. "Co-Operative Components, Spatial Localiza-
tion and Oscillatory Cellular Dynamics." J. Theor. Biol. 34 (1972)219-237.
22. Harary, F. Graph Theory, Reading, MA: Addison-Wesley, 1969.
23. Hartman, H., and G. Y. Vichniac. In Disordered Systems and Biological Or-
ganization, edited by. E. Bienenstock, F. Fogelman-Soulie, and G. Weisbuch.
Heidelburg: Springer-Verlag, 1986.
24. Hopfield, J. J. "Neural Networks and Physical Systems with Emerging Col-
lective Computational Abffit." Proc. Nat'l. Acad. Sci. USA 83 (1982):1847.
25. Hopfield, J. J., and D. W. Tank. "Collective Computation with Continuous
Variables." NATP ASI Series, Disordered Systems and Biological Organiza-
tion, vol. F20, edited by E. Bienenstock et al. Berlin: Springer-Verlag, 1986.
26. Jaffe, S. "Kauffman Networks: Cycle Structure of Random Clocked Boolean
Networks." Ph.D. thesis, New York University, 1988.
27. Jerne, N. K. "Idiotypic Networks and Other Preconceived Ideas." Immunolog-
ical Reviews 79 (1984):5-24.
28. Kauffman, S. A. "Homeostasis and Differentiation in Random Genetic Con-
trol Networks." Nature 224 (1969):177-178.
29. Kauffman, S. A. "Metabolic Stability and Epigenesis in Randomly Connected
Nets." J. Theor. BioL 22 (1969):437-467.
30. Kauffman, S. A. "Cellular Homeostasis, Epigenesis and Replication in Ran-
domly Aggregated Macromolecular Systems." J. Cybernetics 1(1) (1971):71-
96.
31. Kauffman, S. A. "Gene Regulation Networks: A Theory for Their Global
Structure and Behavior." In Current Topics in Developmental Biology 6,
edited by A. Moscana and A. Monroy. New York: Academic Press, 1971, 145-
182.
32. Kauffman, S. A. "The Large-Scale Structure and Dynamics of Gene Control
Circuits: An Ensemble Approach." J. Theor. BioL 44 (1974):167-190.
33. Kauffman, S. A. "Development Constraints: Internal Factors in Evolution."
In Developmental Evolution, edited by B. C. Goodwin, N. Holder, and C. G.
Wylie. Cambridge: Cambridge University Press, 1983, 195-225.
34. Kauffman, S. A. "Pattern Generation and Regeneration." In Pattern Forma-
tion, edited by G. M. Malacinski, and S. V. Bryant. New York: Macmillan,
1984, 73-102
35. Kauffman, S. A. "Emergent Properties in Random Complex Automata."
Physica 10D (1984):145-156.
36. Kauffman, S. A. "Autocatalytic Sets of Proteins." J. Theor. Biol. 119

(1986):1-24.
37. Kauffman, S. A. "A Framework to Think about Regulatory Systems." Inte-
grating Scientific Disciplines, edited by W. Bechtel. The Netherlands: Marti-
nus Nijhoff, Dordrecht, 1986, 165-184.
38. Kauffman, S. A. "Boolean Systems, Adaptive Automata, Evolution." In Dis-
ordered Systems and Biological Organization, edited by E. Bienenstock, F.
Fogelman Soulie, and G. Weisbuch. Berlin: Springer-Verlag, 1986, 338-360.
39. Kauffman, S. A., and R. G. Smith. "Adaptive Automata Based on
Darwinian Selection." Physica 22D (1986):68-82.
40. Kauffman, S. A., and S. Levin. "Towards a General Theory of Adaptive
Walks on Rugged Landscapes." J. Theor. Biol. 128 (1987):11-45.
41. Kauffman, S. A., E. D. Weinberger, and A. S. Perelson. "Maturation of the
Immune Response Via Adaptive Walks on Affinity Landscapes." In Theoreti-
cal Immunology, Part One, SFI Studies in the Sciences of Complexity, vol. II,
edited by A. S. Perelson. Reading, MA.: Addison-Wesley, 1988, 349-382.
42. Kauffman, S. A., and E. D. Weinberger. "Application of NK Model to Matu-
ration of Immune Response." Jour. Theor. Bio., in Press.
43. Kauffman, S. A. Origins of Order: Self-Organization and Selection in Evolu-
tion. Oxford: Oxford University Press, 1990.
44. Kauffman, S. A., and D. Stein. "Application of the NK Model of Rugged
Landscapes to Protein Evolution and Protein Folding." Abstract AAAS Meet-
ing on Protein Folding, June, 1989.
45. Koshland, D. E., Jr. "Evolution of Catalytic Function." Cold Spring Harbor
Symposia on Quantitative Biology, vol. LII. New York: Cold Spring Harbor
Laboratory, 1987, 1-8.
46. Kurten, K. E. "Correspondence between Neural Threshold Networks and
Kauffman Boolean Cellular Automata." J. Phys. A: Math Gen. 21
(1988):615-619.
47. Kurten, K. E. "Critical Phenomena in Model Neural Networks." Physics Let-
ters A. 129(3) (1988):157.
48. Lam, P. M. "A Percolation Approach to the Kauffman Model." J. Statistical
Phys. 50(5/6) (1988):1263-1269.
49. Langton, C. "Artificial Life." In Artificial Life, Santa Fe Institute Studies in
the Sciences of Complexity, vol. VI, edited by Christopher Langton. Reading,
MA: Addison-Wesley, 1989, 1-47.
50. Maynard-Smith, J. "Natural Selection and the Concept of a Protein Space."
Nature 225 (1970):563.
51. McCulloch, W. S., and W. Pitts. "A Logical Calculus of the Ideas Immanent
in Nervous Activity." Bulletin of Math. Biophys. 5 (1943):115-133.
52. Monad, J., J. P. Changeux, and F. Jacob. "Allosteric Proteins and Cellular
Control Mechanisms." J. Mol. Biol. 6 (1963):306.
53. Raup, D. M. "On the Early Origins of Major Biologic Groups." Paleobiology
9(2) (1983):107-115.
54. Rununelhart, D. E., J. L. McClelland, and the PDP research group. Parallel
Distributed Processing: Explorations in the Microstructure of Cognition, vols.
I and H. Cambridge, MA: Bradford, 1986.
55. Schuster, P. "Structure and Dynamics of Replication-Mutation Systems."
Physica Scripts 26B (1987):27-41.
56. Stanley, H. E., D. Stauffer, J. Kertesz, and H. J. Herrmann. Phys. Rev. Lett.
59 1987.
57. Stauffer, D. "Random Boolean Networks: Analogy with Percolation." Philo-
sophical Magazine B 56(6) (1987):901-916.
58. Stauffer, D. "On Forcing Functions in Kauffman's Random Boolean Net-
works." J. Stat. Phys. 40 (1987):789.
59. Stauffer, D. "Percolation Thresholds in Square-Lattice Kauffman Model." J.
Theor. Biol., in press.
60. Walter, C., R. Parker, and M. Ycas. J. Theor. Biol. 15 (1967):208.
61. Weishbuch, G. J. Phys. 48 (1987):11.
62. Weishbuch, G., and D. Stauffer. "Phase Transition in Cellular Random
Boolean Nets." J. Physique 48 (1987):11-18.
63. Weishbuch, G. "Dynamics of Complex Systems. An Introduction of
Networks of Automata." Paris: Interedition, 1989.
64. Wolfram, S. "Statistical Mechanics of Cellular Automata." Rev. Mod. Phys.
55 (1983):601.
Seth Lloyd
Division of Physics and Astronomy, California Institute of Technology, Pasadena, CA 91125
and the Santa Fe Institute, 1120 Canyon Road, Santa Fe, NM 87501
Valuable Information
Information is the currency of nature. A bee carries genetic information con-

tained in pollen to a flower, which in turn supplies the bee with ordered energy
in the form of sugar, a transaction that redounds to their mutual advantage. At
a scale ten orders of magnitude smaller, nuclear spins in a ferromagnet exchange
virtual photons with their neighbors, agreeing on a common orientation as their
temperature drops below the Curie point. At a scale ten orders of magnitude larger,
Neptune induces perturbations in the orbit or Uranus, thereby revealing its pres-
ence to Adamsl and Le Verrier.3 And at a scale twenty orders of magnitude larger
yet, the expansion of the universe constantly increases the difference between the
present entropy of the universe and its maximum possible entropy, augmenting the
informational resources available in the form of gravitational free energy.3
Not all information is of equal value, however. Conventional information theory
defines the total amount of information registered by a system to be the difference
between the system's actual entropy and its maximum possible entropy.8 Equiva-
lently, a drop of sugar water sitting by a pistil registers an amount of information
equal to the free energy of the drop divided by the ambient temperature: the drop's
information is proportional to its calorie content. When the bee gets sugar water in
return for DNA, she is getting a very good deal in food value—pollen is low calorie
stuff. But the flower is not getting the worst of the bargain. Although the DNA the
flower gains is a diet item compared with the sugar water given up, the information

194 Seth Lloyd
that it contains is of high quality, refined by natural selection through generations,

and essential to the flower's ability to reproduce. As a result, the flower places a
higher value on the small amount of information in the pollen than on the large
amount of information in the sugar.
Value depends not only on the amount of information, but on how difficult that
information was to produce. Down the years, far greater resources have gone into
producing flower DNA than the sunlight and soil that went into making today's
drop of sugar water. A process such as evolution consists of many interactions
and exchanges of information, and the genetic make-up of a species contains, in
edited form, information about past interactions between species and environment:
genetic information registers experience.
QUANTIFYING EXPERIENCE
Consider how the set of genes for a particular species of clover changes from one
season to the next. Next season's genes are made up from this season's genes by a
process of mutation and recombination. Given this spring's gene pool, next spring's
gene pool may be any one of a large number of gene sets. Which genes are repre-
sented next spring depends not only on the details of reproduction, but on which
individual plants in this year's crop are most attractive to pollen-bearing bees, on
the local densities of clover population, on rainfall, etc. In short, next season's gene
pool depends on a wide variety of factors, ranging from the relative viability of
individuals within the species to dumb luck.
Although the vagaries of bees, weather, and chance are beyond our control,
information theory allows us to put a measure to the amount of selection that
they effect. Given mutation rates and the rules of recombination, one can calculate
the a priori probabilities for different genetic combinations in the next generation
of clover, given the genetic make-up of the present generation. This probability
distribution has a certain entropy, call it S(next I present). Weather, chance and
bees conspire to pick out a particular set of clover genes for the next generation. In
doing so, they supply the species with an amount of information equal to S(next
present).
The total amount of information supplied to the species over n generations
can be quantified as follows. Let xi, x2, • • - label the possible genetic configurations
of the first generation, second generation, etc. Let p(xi) be the probability that
the actual configuration of genes of the first generation is x1. Let p(x2 x1) be
the a priori probability that the second generation has configuration x2 given that
the first generation had configuration xi . Then p(xi x2) = p(xi)p(x2 I xi) is the
probability of the sequence of configurations xix2 over the first two generations.
Define p(x3 I xi x2), Xxix2x3) = P(xix2)P(x3 I xi x2), etc., in a similar fashion.
The information supplied by the environment in picking out the actual genetic
configuration, xi, of the first generation is then Si = — Es, p(xi) log p(xi). The
Valuable Information 195
amount of information supplied in picking out the actual configuration of the second
generation, x2 is .921x1 = - E22 P(X2 1 xi) log p(x2 I xi), and so on. The total
amount of information supplied to the species by the environment over n generations
is then
Sure = Si + S21,1 + • • • + Snixii z'2 •••x'va -1 •
The explicit expression for Stot in terms of probabilities is not very illumi-
nating. To get a more suggestive form for the information supplied, we turn to
coding theory. Suppose that we want to associate each possible genetic config-
uration with a binary number—that is, we want to encode the influence of the
environment on the species. Coding theory implies that in the most efficient self-
delimiting encoding (Huffman code),4 the length of the message associated with
the genetic configuration xi is - - log2 p(4). Similarly, given that the first config-
uration was xi, the length of the message that codes for x2 is - - log2 p(x; 1 4) =
- log2 p(Xj. xi) + log2 p(4).
In the most efficient coding, the sum of the lengths of the messages that encode
for 4, x2, • • • , xin at each stage is then
•-•• - log2(4 ) - log2(4 I 4 )- • • • -log2 p(4 I ei x2 • • • 4_ 1 ) = - loge p(4 4 • • • 4),
which is simply the length of the measure that encodes for the trajectory
zl, x2, • • • , en in the most efficient coding over genetic trajectories. Define Czl ...e.
= - log p(4 - - • 4) to be the cost of the trajectory 4 • • • x,c.
In the case of clover, Czi ...en measures the amount of information supplied to
the species by the environment over n generations. But cost can be applied to any
other system for which there exists a natural probability distribution over the set
of the system's different possible trajectories. Consider, for example, the cost of
computation. Suppose that one assigns equal probability to all sequences of zeros
and ones as programs for a binary computer.2 The probability that the first m bits
of such a randomly selected program happen to match a given m-bit program is 2-m,
and the cost of the trajectory that the program encodes is C,,,, = - log 2' = m.
The cost of executing a given program is simply equal to its length.
Cost as defined is a function of process. To assign a cost to a particular piece of
information, one identifies the various processes that can result in that information.
Each such process, g, has a cost, Cs! = - logp(g). The information's cost can
either be identified with with the average cost of the processes that result in it,
C = - Eg p(g) log p(g), or with its minimum cost, which is just the cost of the
most likely such process. If the piece of information is a number and the processes
the various computations that result in that number then the number's minimum
cost is just its algorithmic complexity—the length of the shortest algorithm that
produces that number as output.
196 Seth Lloyd
COST AND DEPTH

It is instructive to compare the cost of a trajectory with the trajectory's ther-
modynamic depths The thermodynamic depth of a trajectory is also of the form
— log q(g), where q(g) is the probability of the trajectory g arising as a fluctuation
from thermodynamic equilibrium. The thermodynamic depth of a computer exe-
cuting a given program includes not only the probability that the program stack
will contain the correct program by chance, but also that the computer's memory
will spontaneously clear itself to receive the results of intermediate computations,
and that the power supply will, at the same time, supply as a thermal fluctuation
the energy necessary to run the calculation forward. For processes that take place
far from equilibrium, thermodynamic depth generally far outweighs cost.
A MODEL
Consider a hypothetical two-sexed, monogamous species with N members, in which
each couple mates once each year, resulting in two offspring. There are N/2 mem-
bers of each sex, and (N/2)! different possible combinations of couples. Half of each
offspring's genes come from the mother, half from the father, with recombination
occurring at M sites, for a total of Ml/(M/2)!2 possible genetic combinations for
each offspring, given the genetic make-up of the parents. The total amount of in-
formation involved in picking out one combination of couples and one combination
of genes for each offspring is thus
2
loge [ (2;-.
1 )! !) ) ( LT ) log (-1-V—) + NM ,
2 2 2e
by Stirling's formula.
If N is a million and M is on the order of 30, the cost per generation is roughly
40 million bits, with the amounts of information that goes into mate selection and
into recombination of comparable magnitudes.
The amount of information required to specify the effects of mutation can easily
be included in this mode. If p bits are required to describe the positions and types
of mutations in a given individual per generation on average, then NP bits are
added to the total cost. If the location of each recombination site can vary by q
base pairs along the gene, then M log2 q bits are added to the cost.
Valuable Information 197
PROSPECTUS
Cost is a measure of the amount of information required by a process. Unless one
adopts a labor theory of value that ignores demand, the cost of a process that
produces a piece of information does not equal its value. Information's value should
depend on demand as well as supply—value should reflect usefulness. In addition,
cost does not capture distinctions between different sorts of information: as defined,
cost gives equal weight both to random information supplied by mutation and to
ordered information supplied by selective pressure. A more comprehensive measure
might discount random information, while retaining the contribution of ordered
information from the environment.?
Nevertheless, cost is the obvious measure of the total amount of information
that needs to be supplied to a system in the course of a given process. As con-
firmation, cost reduces to program length when specialized to computation. And,
although the cost of a piece of information may not determine its value, the genetic
cost of the evolution of a species can be tens of millions of bits of non-random
information per generation, or much more. One might be careful of bringing such
a species to extinction, in the event that one must pay for what one breaks.
ACKNOWLEDGMENTS
Work supported in part by the U.S. Department of Energy under Contract No.
DE-AC0381-ER40050
198 Seth Lloyd
REFERENCES
1. Adams, J. C. Manuscript Nos. 1841-1846, St. John's College Library, Cain-
bridge University.
2. Bennett, C. H. Intl. J. Theor. Phys. 21 (1982):905.
3. Frautschi, S. Science 217 (1982):593.
4. Hamming, R. W. Coding and Information Theory. Englewood Cliffs: Prentice
Hall, 1986.
5. Le Verrier, W. C. J. C. R. Acad. Sci. 21 (1845):1050.
6. Lloyd, S., and H. R. Pagels. Ann. Phys. 188 (1988):186.
7. Lloyd, S., and H. R. Pagels. To be published.
tion. Urbana: University of Illinois Press, 1949.
Dilip K. Kondepudi
Department of Chemistry, Box 7486, Wake Forest University, Winston-Salem, NC 27109
Non-Equilibrium Polymers, Entropy, and

Algorithmic Information
INTRODUCTION
In polymers, nature realizes information-carrying sequences in a simple way. In the
last three decades, molecular biology has revealed to us how information is carried
in DNA sequences and how this information is translated into proteins that have a
definite function. We have many details of these awesome, complex processes, but
we have only a poor understanding of how and when such an information-processing
systems will spontaneously evolve. The questions regarding the spontaneous evo-
lution of information-processing systems are more general than the question of the
origin of life; they are questions regarding the origin of "complexity." Though com-
plexity does not have a precise physical meaning, we can describe some aspects
of it in terms of algorithmic information, especially in the case of self-organizing
polymer systems.
As will be shown below, thermodynamic quantities such as entropy and free
energy do not characterize all the essential features of complexity. We need new
physical quantities, perhaps quantities such as algorithmic information. In the con-
text of polymers, algorithmic information can be associated with a particular poly-
mer sequence and an algorithm can be associated with a catalyst that produces this
sequence. I would also like to point out that a physical significance of algorithmic

200 Dilip K. Kondepudi
information lies in the fact that, in some self-organizing nonequilibrium processes

at least, the natural evolution of the system seems to be towards some sort of
minimization of algorithmic information.
The most general (necessary but not sufficient) condition for the evolution of
complexity is that the system should be thermodynamically in a nonequilibrium
state. I will focus on one particular polymer system and illustrate how nonequilib-
rium sequence distributions are related to the various thermodynamic quantities.
This will be followed by a discussion of algorithmic information in polymer se-
quences and its physical significance.
PHYSICAL SIGNIFICANCE OF NONEQUILIBRIUM SEQUENCE

DISTRIBUTION
Consider a polymer consisting of two types of monomer units R and S. We may
think of the two monomers to be mirror-image molecules, i.e., enantiomers. For
simplicity, we shall consider polymers all of the same length, though the arguments
presented here are in no way restricted by this requirement. Let the polymers all
be five units long so that we have the sequences:
R—R—R—R—R, R—S—S—R—R, S—S—R—S—S, ••• etc.
We assume that the interconversion between R and S in a polymer can be con-

trolled through a catalyst. We shall also assume that the entropy and energy of the
sequences differ only by an small amount which can be ignored.
With the above assumptions, it follows that at equilibrium the sequence distri-
bution is uniform, i.e., all sequences are equally probable. When the the sequence
distribution is not uniform, the system is thermodynamically in a nonequilibrium
state as shown in Figure 1.
R-R-S-R-S R-R-R-R-R R-S-R-R-S

R-R-R-R-S R-R-R-R-R R-S-R-R-S
S-S-S-R-R R-R-R-R-R R-S-R-R-S
S-S-R-R-S R-R-R-R-R R-S-R-R-S

(A) (B) (C)
FIGURE 1 Equilibrium and nonequilibrium sequence distributions. (A) is the equilibrium

state with uniform distribution in which all sequences are equally probable. (B) and (C)
are two possible nonequilibrium states in which only one sequence appears. Specifying
(C) requires more information than specifying (B), but the same amount of useful work
can be obtained by converting (B) or (C) to the equilibrium state (A).
Quantum Mechanics 201
In the equilibrium state (A), all sequences appear with equal probability. In
the nonequilibrium states (B) and (C), only the sequences R — R — R — R — R and
R—S—R—R—S respectively appear.
A nonequilibrium state such as (B) or (C) has lower entropy compared to the
equilibrium state (A). Since there are 25 possible species, the difference in entropy
AS = nRln(25), in which n is the number of moles and R is the gas constant. At
temperature T, the difference in Helmholtz free energy AF = M.S. The immediate
consequence of this is that "weights can be lifted" through the transformation of
the nonequilibrium state to the equilibrium state. The amount of useful work that
can be obtained is equal to AF.
This can be done by using van't Hoff's boxes and a suitable membrane as shown
in Figure 2. Here we assume that the system is an ideal gas. The scheme consists
of two chambers, A and B, separated by a membrane that is permeable only to
polymers of a particular sequence, such as R— R— R— R— R. The pressure and
volume of chamber A are PA and VA and for chamber B, they are PB and V.•
The entire set up is in contact with a heat reservoir which is at a temperature T.
The volumes of the two chambers can be altered by moving pistons as shown in the
Figure 2. In chamber B, there is no catalyst that converts R to S so that a nonequi-
librium distribution remains as it is. In chamber A, however, there is a catalyst and
hence the sequence of the polymers entering this chamber will transform to other
sequences. Thus the number of species in chamber B increases as molecules enter
it from chamber A. The partial pressure of the polymer R — R — R — R — R in
chamber A will equal the pressure, PB, in B. Since the total number of species is
25, the pressure in chamber A, PA = 25 x PB. If initially VA = 0 and VB = VB0, then
by slowly and simultaneously moving the pistons in such a way that PA and PB
are maintained constant ( at their equilibrium value) all the molecules can be forced
Membrane CATALYST
R-S-R-R-S
R-R-R-R-R R-R-R-S-S
S-S-R-R-S
R-R-R-R-R
B A
FIGURE 2 Lifting weights using a nonequilibrium polymer distnbution. The

scheme shows how the volume of the system can be decreased while converting a
nonequilibrium state in chamber B to an equilibrium state in chamber A. Weights can
be lifted by the subsequent expansion of the volume of A to the initial value.
CATALYST
Membrane A Membrane 8
R-S-R-R-S
R-R-R-S-S
S-S-R-R-S R-S-R-R-S
R-R-P-R-R
M R
FIGURE 3 Converting a sequence with lower algorithmic information to a sequence

with higher algorithmic information with no expenditure of free energy.
from chamber B to chamber A. At the end of this process VA will equal (VB0/25 ).
Weights can now be lifted by allowing the gas in A to expand to the initial volume
VB0- It can easily be seen that the amount of useful work that can be obtained is
AF =TRnln(25)
Clearly, through a similar conversion of the nonequilibrium state (C) (shown
in Figure 1) to the equilibrium state (A), the same amount of useful work can be
obtained. In this way we see that a physical consequence of a nonequilibrium state
is that: "weights can be lifted."
Turning now to the algorithmic-information point of view, we can distinguish
nonequilibrium states (such as (B) and (C) shown in Figure 1) on the basis of
algorithmic information: the algorithm required to generate the sequence (B) is
surely shorter than the algorithm for (C). However, this difference does not have a
simple physical consequence such as "lifting weights." That no "lifting of weights"
can be accomplished through the conversion of state (C) to state (B) follows from
the existence of the process, represented in Figure 3, in which the state (B) is
converted to state (C) without any expenditure of free energy. In Figure 3, the
membrane separating the chambers L and M is permeable only to the molecule
R — R — R — R — R while the membrane separating the chambers M and R is
permeable only to R — S — R— R— S. In the central chamber, M, the polymers are
in a state of equilibrium due to the presence of a catalyst. Since the mole fractions
of all the different species are equal, the pressures in chambers L and R will be
equal. Hence by moving the pistons in the directions indicated in the figure, one
species can be converted to another with no expenditure of energy.
Note that from a computational point of view the sequence R—R—R—R—R has
been converted reversibly to the sequence R — S — R — R — S with no dissipation
of energy. This is one way of realizing the general observation made by Charles
Bennettl that computation can be done reversibly, in the sense of Carnot. Thus,
by inserting and removing appropriate membranes at the appropriate time, any
computation, i.e., a series of conversions of one sequence of digits to another, can

be performed reversibly by a device which is more complex but based on the same
principle.
PHYSICAL SIGNIFICANCE OF ALGORITHMIC INFORMATION

If at the expense of algorithmic information of polymer sequences we cannot "lift
weights," what is its physical significance? In general, we see that a state which
is algorithmically more complex, such as (C), is more difficult to prepare. Thus, a
membrane which is permeable only toR—S—R—R—Shas to be more complex
than a membrane that is permeable only to R—R—R—R—R. On the other
hand, all other conditions remaining the same, the same amount of useful work can
be obtained from either because they have the same amount of free energy. Thus,
we may talk of a state of free energy with minimum algorithmic information. If we
were to prepare states that store free energy which can later be used to lift weights,
then it is easier to prepare states with sequence R—R—R—R — R than states
with the sequence R—S—R—R—S.
This concept of "minimum algorithmic information states" may become impor-
tant in nonequilibrium systems that are capable of self-organization. We can see
how this can come about in the following model of a "life form" shown in Figure 4.
In this model the free energy available in a nonequilibrium polymer state is used
by the "life form" to do useful work or lift weights. The building blocks of the
polymer are two monomers, R and S, which can interconvert rapidly when they
are not incorporated in a polymer. When these monomers polymerize, we assume
that some of the resulting polymers can catalyze the polymerization of some par-
ticular sequences. For instance, a particular polymer may catalyze the formation of
an R— R bond. It is easy to see that this combined with the rapid interconversion
of the monomers results in the production of a nonequilibrium state consisting of
a large amount of R—R—R—R—R••• polymers. Alternatively, the catalysis
may produce more complex sequences such as a repeating R—S—R—R—S•••
resulting in a nonequilibrium state that contains a large amount of these polymers.
Now we assume that the life form uses the free energy of either of these nonequi-
librium states to "lift weights" as described in the previous section. In this process,
the polymers are converted to their equilibrium state in which all sequences occur
with equal probability. To complete the "life cycle," we assume that the "solar ra-
diation" decomposes the polymers back to monomers. Clearly, this life form will
survive equally well through the catalytic production of either R— R— R— R— R • • •
or R—S—R—R—S•••, for the free energy content does not depend on the se-
quence. Yet there is an important difference that can be understood through the
concept of algorithmic information.
204 Dilip K Kondepudi
R s R R
R R S
S S
R S R
R-R-R-R-R
/
R-S-R-R-S
Equilibrium State
FIGURE 4 A simple model to illustrate minimization of algorithmic information in a self-

organizing system.
The difference is in the complexity of the catalysts that produces a particular

sequence. The catalyst contains the algorithm. The catalyst that produces the se-
quence R—R—R—R—R•-- can be expected to be less complex in structure than a
catalyst that produces the repeating sequence R—S—R—R—S. If these catalysts
are produced randomly through the initial polymerization (in which we assume all
sequences occur with equal probability), we can expect the catalyst that produces
the sequence R— R — R — R— R••• to occur with much greater probability than
the catalyst that produces the repeating sequence R — S — R — R—S•••. Hence,
for situations such as this at least, we might conclude that natural evolution will
give rise to the system that is algorithmically most simple.
One important thing to note here is that the objective of the algorithm is clearly
defined: it is to produce a nonequilibrium state with usable free energy. In the real
world, however, the objective of the "algorithms" that the DNA carries is not so
clearly defined because the environment in which the algorithm will function is very
complex. However, there is one interesting example of "algorithmic optimization"
in the realm of RNA replication that has emerged in the work of Spiegelman .2
Spiegelman's study began with the isolation of RNA that contains about 4500
units from a Qb virus that replicates using the resources of the cell that it infects.
The intra-cellular environment being a complex one, for successful replication this
RNA carries with it the algorithm to synthesize proteins that form a protective
shell. In an environment that is more conducive to the replication of the RNA, such
an algorithm is superfluous. Spiegelman placed the RNA with 4500 units in an
environment conducive to replication and watched it evolve. Soon mutations with

smaller number of units that could replicate faster arose and replaced the original
RNA. Thus the original algorithm was replaced by a smaller one. This process
continued until the number of units in the RNA reduced to about 220. The natural
evolution of the system had thus evolved to a a state of "minimum" algorithmic
information.
These examples suggest that in self-organizing nonequilibrium systems, the nat-
ural evolution, which proceeds through random fluctuations, will evolve to a state of
minimum algorithmic information and that therein may lie a physical significance
of algorithmic information.
It must be noted that the use of algorithmic information as discussed above is
not the only one. There is in fact a very general way of using this concept. As de-
scribed in the articles by Zurek4,3 the usual concept of entropy can be reinterpreted
as the algorithmic information necessary to specify the physical state of the sys-
tem. If that formulation is applied to the case of polymer systems, the algorithmic
information of the polymer sequence, which is a factor of the order of k log L where
L is the average length of polymers, is too small to be included in the total entropy
(other factors of this order are ignored in obtaining the entropy of the system).
Hence the fact that the algorithmic information in a polymer sequence cannot be
used to lift weights is not in contradiction with the formulation presented by Zurek.
REFERENCES
1. Bennett, C. H. "The Thermodynamics of Computation—A Review." Intl J.
of Theor. Phys. 21 (1982):905-940.
2. Spiegelman, S. "An in Vitro Analysis of a Replicating Molecule." Amer. Sci.
55 (1967):221-264.
A40 (1989):4731-4751.
4. Zurek, W. H. "Algorithmic Information Content, Church-Turing Thesis,
Physical Entropy, and Maxwell's Demon? This volume.
Tad Hogg
Xerox Palo Alto Research Center, Palo Alto, CA 94304
The Dynamics of Complex Computational

Systems
A wide range of difficult computational problems, such as those involved in

interacting with the physical world, can be addressed by collections of co-
operative computational processes. By characterizing the effect of exchange
of information among processes, several examples of a universal distribu-
tion of performance are presented. This distribution exhibits an enhanced
high-performance tail whose extent depends on the diversity of interac-
tions in the system. In some cases, performance is further increased by
crossing sharp phase transitions in the topological structure of the prob-
lem. This thus gives an explicit connection between the complexity of a
problem-solving system, measured by its diversity, and its ability to effec-
tively obtain solutions.
INTRODUCTION
Distributed problem solving is a pervasive and effective strategy in situations re-
quiring adaptive responses to a changing environment. As shown by the examples of

208 Tad Hogg
the scientific community, social organizations, the economy, and biological ecosys-
tems, a collection of interacting agents individually trying to solve a problem using
different techniques can significantly enhance the performance of the system as
a whole. This observation also applies to computational problems, such as traffic
control, acting in the physical world, and interpreting real-time multi-sensor data,
where the emergence of computer networks and massively parallel machines has
enabled the use of many concurrent processes.
These tasks generally require adaptability to unexpected events, dealing with
imperfect and conflicting information from many sources, and acting before all rele-
vant information is available. In particular, incorrect information can arise not only
from hardware limitations but also from computations using probabilistic methods,
heuristics, rules with many exceptions, or learning resulting in overgeneralization.
Similarly, delays in receiving needed information can be due to the time required
to fully interpret signals in addition to physical communication delays.
Directly addressing problems with these characteristics usually involves tech-
niques whose resource requirements (e.g., computer time) grow exponentially with
the size of the problem. While such techniques thus have high computational
complexity,16 more sophisticated approaches, employing various heuristic tech-
niques, can often overcome this prohibitive cost. Heuristics are effective in many
interesting real-world computation problems because of the high degree to which
experience on similar problems, or subtasks, can be generalized and transferred to
new instances. Moreover, there are often many alternate approaches to each prob-
lem, each of which works well in circumstances that are difficult to characterize a
priori. It is thus of interest to examine the behavior of collections of interacting
processes or agents which solve such problems, abstracted away from detailed is-
sues of particular algorithms, implementations, or hardware. These processes must
make decisions based upon local, imperfect, delayed, and conflicting information
received from other agents reporting on their partial success towards completion of
a goal. Such characteristics, also found in social and biological communities, lead
us to refer to these collections as computational ecosystems.9
A general issue in studying such systems is to characterize their complexity
and relate it to their behavior. While a number of formal complexity measures have
been proposed, one appropriate for describing the performance of computational
systems should capture the observation that both ordered and random problems can
be addressed with relatively simple techniques. In particular, explicit algorithms are
effective when there are a limited number of contingencies to handle. Similarly, sta-
tistical techniques are useful where there are limited dependences among variables
and the relevant conditional probabilities can easily be estimated from the data.
To consider the more interesting, and more difficult, computational problems de-
scribed above, we focus on complexity measures, such as diversity, that assign high
values to intermediate situations x,6,7 This is in contrast to conventional measures
of algorithmic randomness4,11,18 which are primarily concerned with the minimal
program required to reproduce a given result rather than with the difficulty of
devising programs to effectively solve particular problems.
The Dynamics of Complex Computational Systems 209
While these measures capture many of the intuitive notions of complexity as

applied to computational tasks, there remains the question of how such measures
relate to system performance. As applied to large-scale computational problems in-
volving a range of possible techniques, we will see that systems with high diversity
are best able to address difficult computational problems, establishing a connec-
tion between system complexity and its computational performance. In particular,
those systems with the highest diversity are able to most effectively utilize the in-
formation gained from the various cooperative agents, and have the highest overall
performance. Since the emergence of complex behavior (i.e., high performance on
difficult problems requiring adaptability and dealing with imperfect information) is
an important issue, these results suggest that the development of diversity is a key
ingredient in this process.
Recent work has shown that agents in computational ecosystems exhibit a
wide range of dynamical behaviors including complex equilibria and chaos s,19 Such
results are useful for characterizing the average behavior of these systems and the
effect of uncertainty and delays. However, in many of these computational problems,
those agents making the most progress per unit time are the ones that set the overall
performance. An example is given by a concurrent search in a large database in
which many agents search for an item which satisfies given constraints. Here the
overall search time is determined by the agent which arrives at the answer first. In
these situations it is important to determine the distribution of performance among
the agents, particularly the nature of the high-performance tail. This emphasis on
the performance distribution highlights the need to study more than just the average
behavior of highly interacting systems.
To determine the distribution of performance, one might expect that it is neces-
sary to know the details of the cooperating processes. Fortunately, however, existing
highly cooperative systems, when sufficiently large, display a universal distribution
of individual performance, largely independent of the detailed nature of either the
individual processes or the particular problem being tackled. In particular, this
predicts an extended tail of high performance and can be expected to apply when
such performance requires successful completion of a number of nearly independent
steps or subtasks. For instance, this distribution has been observed to describe
a wide range of systems1.5 including scientific productivity,15 species diversity in
ecosystems,12 and income distributions in national economies.13 Since these ex-
amples share many features with the complex computational problems described
above, we conjectured that the performance distribution of computational ecosys-
tems will display the same quantitative characteristics.9
In this chapter, we consider the performance characteristics of interacting pro-
cesses engaging in cooperative problem solving. For a wide class of problems, there
is a highly nonlinear increase in performance due to the interactions between agents.
In some cases this is further enhanced by sharp phase transitions in the topological
structure of the problem. Specifically, we present these results in the context of three
prototypical search examples. The first considers a general search for a particular
goal among a number of states. The second describes the further enhancement of
performance due to phase transitions in a hierarchical search problem. The final
210 Tad Hogg
example concerns a search for a good, but not necessarily optimal, state in a lim-
ited amount of time. Throughout these examples we show how the existence of a
diverse society of processes is required to achieve this performance enhancement.
We thus obtain a connection between a measure of the system's complexity and its
performance.
CONCURRENT SEARCH
We consider the case of heuristically guided search,14 which applies to a wide range
of problems. A search procedure can be thought of as a process which examines a
series of states until a particular goal state is obtained. These states typically rep-
resent various potential solutions of a problem, usually obtained through a series of
choices. Various constraints on the choices can be employed to exclude undesirable
states. Examples range from well-defined problem spaces as in chess to problems in
the physical world such as robot navigation.
As a specific example, consider the case of a d-dimensional vector, each of whose
components can take b different values. The search consists of attempting to find a
particular suitable value (or goal) among the bd possible states. It is thus a simple
instance of constrained search involving the assignment of values to components of
a vector subject to a number of constraints. A random search through the space
will, on average, find the goal only after examining one half of the possibilities,
an extremely slow process for large problems (i.e., the required time is exponential
in d, the number of components to be selected). Other specific approaches can be
thought of as defining an order in which the possible states are examined, with
the ensuing performance characterized by where in this sequence of states the goal
appears. We now suppose that n agents or processes are cooperating on the solution
of this problem, using a variety of heuristics and that the problem is completed by
the first agent to find the solution. The heuristic used by agent i can be simply
characterized by the fraction f between 0 and 1, of unproductive states that it
examines before reaching the goal. A perfect heuristic will thus correspond to fi = 0
and one which chooses at random has fi = 1/2.
In addition to their own search effort, the agents exchange information regard-
ing the likely location of the goal state within the space. In terms of the sequence
of states examined by a particular agent, the effect of good hints is to move the
goal toward the beginning of the sequence by eliminating from consideration states
that would otherwise have to be examined. A simple way to characterize a hint is
by the fraction of unproductive notes, that would have otherwise been examined
before reaching the goal, that the hint removes from the search. Since hints need
not always be correctly interpreted, they can also lead to an increase in the actual
number of nodes examined before the answer is found. For such cases, we suppose
that the increase, on average, is still proportional to the amount of work remaining,
i.e., bad hints won't cause the agent to nearly start over when it is already near
the goal but will instead only cause it to reintroduce a small number of additional
possibilities. Note that the effectiveness of hints depends not only on the validity
of their information, but also on the ability of recipients to interpret and use them
effectively. In particular, the effect of the same hint sent to two different agents can
be very different.
A simple example of this characterization of hint effectiveness is given by a
concurrent search by many processes. Suppose there are a number of characteris-
tics of the states that are important (such as gender, citation, and subfield in a
database). Then a particular hint specifying gender, say, would eliminate one half
of all remaining states in a process that is not explicitly examining gender.
To the extent that the fractions of unproductive nodes pruned by the various
hints are independent, the fraction of nodes that an agent i will have to consider is
given by
fi = fi nitial f hint (1)
J
:70i
where fr nit is the fraction of nodes eliminated by the hint that the agent i receives
from agent j, and finitial characterizes the performance of the agent's initial heuris-
tic. Note that hints which are very noisy or uninterpretable by the agent correspond
to a fraction equal to one because they do not lead to any pruning on the average.
Conversely, a perfect hint would directly specify the goal and make h equal to zero.
Furthermore, we should note that since hints will generally arrive over time during
the search, the fractions characterizing the hints are interpreted as effective values
for each agent, i.e., a good hint received late, or not utilized, will have a small effect
and a corresponding hint fraction near one.
The assumption of independence relies on the fact that the agents broadcast
hints that are not overlapping, i.e., the pruning of two hints won't be correlated.
This will happen whenever the agents are diverse enough so as to have different
procedures for their own searchers. If the agents were all similar, i.e., the pruning
was the same for all of them, the product in Eq. (1) would effectively only have one
factor. For intermediate cases, the product would only include those agents which
differ from each other in the whole population. As an additional consideration, the
overall heuristic effectiveness fi must not exceed one, so there is a limit to the
number of independent hint fractions larger than one that can appear in Eq. (1).
We therefore define neff to be the effective number of diverse agents, which in turn
defines the actual number of terms in the product of Eq. (1). This leads to a direct
dependence of the pruning effectiveness on the diversity of the system. Although
the hints that individual agents find useful need not come from the same sources,
for simplicity,we suppose the number of diverse hints received by each agent is the
same.
We now derive the law that regulates the pruning effectiveness among agents.
By taking logarithms in Eq. (1), one obtains
log A = log flnitiai + log f + + log fniz!fl (2)

212 Tad Hogg
where we have included only terms arising from diverse hints. If the individual
distributions of the logarithms of the fractions satisfy the weak condition of hav-
ing a finite variance, and if the number of hints is large, then the central limit
theorem applies. Therefore, the values of log fi for the various agents will be nor-
mally distributed around its mean, with standard deviation o, i.e., according to
N(p, a, log fi). Here p and cr2 are the mean and variance of the log fi of the various
agents, which are given by the sum of the corresponding moments of the individual
terms in the sum. In other words, f itself is distributed according to the lognormal
distribution)
_p0g1_ey 20 2 N(p, u, log f)

Prob(f) = 1 (3)
crf1/2ir
which gives the probability density for a given agent to have various values of f.
The mean value of f is m = 0'4 (° 312) and its variance is given by m2(e'72 —1). This
distribution is highly asymmetric with a long tail, signifying an enormous range of
performance among the individual agents.
To examine the effect of hints, we measure performance for the agents in terms
of the speed at which they solve the problem. This is given by
size of search space bd

S= (4)
time to reach goal f(bd — 1) + 1
where the time required to find the goal is just the number of states that were
actually examined during the search. For the large search spaces of interest here,
this will be approximately given by 1/f except for very small values of f . When
a variable such as f is lognormally distributed, so is any power of it, in particular
1/f. Hence, the lognormal distribution of f derived above will produce a similar
distribution for corresponding values of S. In practice, of course, there is a finite
upper limit on performance (given, in this case, by Sma. = bd) even though f can
be arbitrarily small. This implies an eventual cutoff in the distribution at extremely
high performance levels. Nevertheless, the extended tail of the lognormal distribu-
tion can be expected to adequately describe the enhancement in performance due
to exchange of hints for values well below this maximum.
As a concrete case, we consider the situation in which hints, on the average, nei-
ther advance nor hinder performance of the system as a whole, i.e., the mean value
of the hint fractions is one, which can be considered a worst case scenario. Thus,
any improvement of the overall performance of the system will come from the tail
of the distribution. Specifically, we take the fin" values to be normally distributed
according to N(1, 0.02, f). We also take the initial performance of the agents (i.e.,
speed S without hints) to be normally distributed according to N(4, 0.05, S) which
corresponds to somewhat better than random search. These choices ensure that
there is a negligible chance for S to reach its maximum, so the lognormal distribu-
tion will accurately describe the high performance tail in the range of interest. The
resulting distributions are compared in Figure 1.
8.
6.
p 4.
2.
1
3. 4. 5. 6.
S
FIGURE 1 Distribution of agents according to their performance S in a search with

b = 5 and d = 20. The dashed curve corresponds to the noninteracting case of no
hints exchanged during the search. The solid gray curve corresponds to neff = 10,
and the solid black one to neff = 100. Notice the appearance of a long tail in the
interacting cases, which results in an improvement in performance. The area under
each of the curves is one. The nature of the hints is such that, on the average, they
neither enhance nor retard the search procedure.
Because of the enhanced tail, a collection of cooperating agents is far more

likely to have a few high performers than the noninteracting case. This can be
seen by examining the tail of the distributions, particularly the top percentiles of
performance. In particular, for a system with n agents the expected top performer
will be in the top 100/n percentile. This can be quantified by specifying the speed
reached or exceeded by the top performers. With no hints, the top 0.1 percentile
is located at a speed of 4.15. On the other hand, this percentile moves up to 4.89
and 7.58 when ne ff = 10 and neff = 100 respectively. Note that this percentile
characterizes the best performance to be expected in a collection of 1000 cooperating
agents. The enhancement of the top performers increases as higher percentiles or
larger diversity are considered, and shows the highly nonlinear multiplicative effect
of cooperative interactions. If the hints do help on average, in addition to the
enhanced tail there will also be a shift of the peak toward higher values. In such a
case, high diversity is less important.
214 Tad Hogg
HIERARCHICAL SEARCH AND PHASE TRANSITIONS

A very important class of problem solving involves heuristic searches in tree struc-
tures. Thus it is important to elucidate how the above considerations of coopera-
tion apply to this case. In particular, suppose the search takes place in a tree with
branching ratio 6 and depth d, so the total number of nodes in the tree is given by
0+1 _ 1
Ntotal = b — 1 (5)
This can be viewed as an extension of the previous example in that successive levels
of the tree correspond to choices for successive components of the desired vector,
with the leaves of the tree corresponding to fully specified vectors. The additional
tree structure becomes relevant when the heuristic can evaluate choices based on
vectors with some components unspecified. These evaluations offer the possibility
of eliminating large groups of nodes at once.
The search proceeds by starting at the root and recursively choosing which
nodes to examine at successively deeper levels of the tree. At each node of the tree
there is one correct choice, in which the search gets one step closer to the goal.
All other choices lead away from the goal. The heuristic used by each agent can
then be characterized by how many choices are made at a particular note before
the correct one is reached. The perfect heuristic would choose correctly the first
time, and would find the goal in d time steps, whereas the worst one would choose
the correct choice last, and hence be worse than random selection. To characterize
an agent's heuristic, we assume that each incorrect choice has a probability p of
being chosen by the heuristic before the correct one. Thus the perfect heuristic
corresponds to p = 0, random to p = 0.5, and worst to p = 1. For simplicity, we
suppose the heuristic effectiveness, as measured by p, is uniform throughout the
tree. Alternatively, p can be thought of as the value of the effectiveness averaged
over all nodes in the tree. In the latter case, any particular correlations between
nodes are ignored, in the spirit of a mean-field theory, which can be expected to
apply quite well in large-scale problems. Note that while p specifies the fraction of
incorrect choices made before the correct one on average throughout the tree, this
probabilistic description allows for variation among the nodes.
The a posteriori effect of hints received from other agents can be described as
a modification to an agent's value of p. Assuming independence among the hints
received, this probability is given by
rtes!
initial TT ;hint
(6)
-
Pi j j*i
5=1
where prtial characterizes the agent's initial heuristic and the hint fractions are the
same as introduced in the previous section, but now averaged over the entire tree.
By supposing the various quantities appearing in Eq. (6) are random variables,
we again obtain the universal lognormal distribution (over the set of agents) of
heuristic effectiveness when there are a large number of agents exchanging hints.
Given this distribution of local decision effectiveness, we now need the distri-
bution of performance in the full search problem, i.e., the rate at which the search
for the goal is completed. This relationship is more complex than in the unstruc-
tured example considered above, and in particular it produces a phase transitions
in overall agent performance at a critical value of p. This sharp transition leads to
the possibility of an additional enhancement in performance.
Specifically, the overall performance is related to the time T, or number of
steps, required to reach the goal from the root of the tree. To quantify the search
performance, we consider the search speed given by
number of nodes in the tree Ntotal
S= (7 )
number of steps to the goal
To compare trees of different depths, it is convenient to normalize this to the

maximum possible speed, namely Smax = Ntotadd, giving the normalized speed
s S/Smax = dIT.
Because of the probabilistic characterization of the heuristic for each agent,
T is a random variable. It is determined by two contributions: the length of the
correct path to the goal (equal to the depth of the search tree, d), plus the number
of nodes visited in every incorrectly chosen subtree along the way to the goal, in
itself a random variable. While the actual probability distribution of T values for
a given value of p is complicated, one can show that the average number of steps
required to reach the goal is given bys
(p — p)(d — p — dp + ul+i)
(T) — d+ (8)
(P -1)2
where µ = bp. As the depth of the tree increases, this becomes increasingly singular
around the value p = 1, indicating a sudden transition from linear to exponential
search. This is illustrated in Figure 2 which shows the behavior of s = d/(T) as a
function of the local decision effectiveness characterized by p. Near the transition,
a small change in the local effectiveness of the heuristic has a major impact on
the global behavior of large-scale search problems. The existence of such a phase
transition implies that, in spite of the fact that the average behavior of cooperative
algorithms may be far into the exponential regime, the appearance of an extended
tail in performance makes it possible for a few agents to solve the problem in
polynomial time. In such a case, one obtains a dramatic improvement in overall
system performance by combining these two effects. We should note that other
search topologies such as general graphs also exhibit these phase transitions3 so
these results can apply to a wide range of topologies found in large-scale search
problems.
216 Tad Hogg
FIGURE 2 Plot of vs. local decision effectiveness for trees with branching ratio 5
and depths 10, 20 and 100. The distinction between the linear regime (p < 0.2) and
the exponential one becomes increasingly sharp as the depth increases. The dashed
curve is the omit for an infinitely deep tree and shows the abrupt change at p = 0.2
from linear to exponential search.
Finally, to illustrate the result of combining diverse hint with the phase transi-
tion in tree searchers, we evaluate the distribution of relative global speed s for the
agents searching in a tree with a branching ratio b = 5 and depth d = 20. This com-
bines the distribution of local decision effectiveness with its relation to global speed.
As in the previous example, we suppose hints on average neither help nor hinder the
agents. In particular, we take the fri ' values to be normally distributed according
to N(1, 0.015,f). We also take the initial performance of the agents (i.e.,
to be normally distributed according to N(0.33, 0.0056, p) which corresponds to a
bit better than random search. The resulting distributions were evaluated through
simulations of the search process and are compared in Figure 3, on a logarithmic
scale to emphasize the extended tails.
In this case, the enhancement of the global performance of the system is most
dramatic at the higher end of the distribution, not all of which is shown in the
figure. In this example, the top 0.1 percentile agents will have an enhancement of
global speed over the case of no hints by factors of 2 and 41 for 10 and 100 hints
respectively. This illustrates the nonlinear relation between performance, number
of agents, and diversity of hints.
5
10
0.005 0.03. 0.015 0.02
FIGURE 3 Distribution of agents (on a log scale) as a function of relative global

speed s for a concurrent search in a tree with b = 5, and d = 20. The dashed line
corresponds to the case of no hints being exchanged during the search. The solid
gray curve corresponds to nen = 10, and the solid black one to nen = 100.
Note the enhancement of the high performance tail as the diversity of exchanged hints
increased.
SATISFICING SEARCHES
In many heuristic search problems, the exponential growth of the search time with
problem size forces one to accept a satisfactory answer rather than an optimal one.
In such a case, the search returns the best result found in a fixed amount of time
rather than continuing until the optimal value is found. To the extent that such
returned results have high value, they can provide acceptable solutions to the search
problem without the cost involved in obtaining the true optimum. A well-known
instance is the traveling salesman problem, consisting of a collection of cities and
distances between them and an attempt to find the shortest path which visits each
of them. The time required to find this path grows exponentially with the number
of cities. For large instances of the problem, one must settle instead for paths that
are reasonably short, compared to the length of an average path, but not optimal.
In these cases of limited search time, the extended tails of the cooperative
distributions discussed above result in a better value returned compared to cases in
218 Tad Hogg
which hints are not used. To see this we consider an unstructured search problem
where the states have various values, v, which we take to be integers between 0
and some maximum V. In the previous examples, one could view the single goal
as having the maximum value while all other states have a value of 0. To allow
for the possible usefulness of nonoptimal states, we suppose that their values are
distributed throughout the range. In order that a simple random search is unlikely
to be effective, we need relatively few states with high value. A simple distribution
of values satisfying these requirements is given by the binomial distribution:
V
= kv 3v —v (9)
(
where my is the number of states with value v. Note that this has exactly one state
with the maximum value and most states have smaller values clustered around the
average V/4.
For problems of this kind, the effectiveness of a heuristic is determined by how
well it can discriminate between states of high and low value. When faced with
selecting among states with a range of values, a good heuristic will tend to pick
those states with high value. That is, the likelihood of selecting a state will increase
with its value. Moreover, this increase will become more rapid as the heuristic
improves. As a concrete example, we suppose that the heuristics used by the various
agents in the search are characterized by a discrimination parameter a such that
states with value v are selected by the heuristic with relative probability ay. Large
values of a provide excellent discrimination while a = 1 corresponds to random
selections. In terms of our previous examples, in which only the goal had a nonzero
value, the relative selection probabilities were 1 for the goal and p for all other
states. Thus we see that this characterization of heuristic discrimination identifies
aV with 1/p in the case of only two distinct values. As in the previous examples,
cooperation among diverse agents leads to a lognormal distribution of selection
probability values among the agents. Here this means the a values will themselves
be lognormally distributed.
Instead of focusing on the time required to find the best answer, we examine the
distribution of values returned by the various agents in a given interval of time. As
an extreme contrast with the previous examples, which continued until the goal was
found, we allow each agent to examine only one state, selected using the heuristic.
The value returned by the agent will then correspond to this state. (If additional
time were available, the agents would continue to select according to their heuristic
and return the maximum value found.) These simplifications can be used to obtain
the distribution of returned values resulting from interactions among the agents as
a function of the number of diverse agents, nefp
FIGURE 4 Distribution of values returned in a satisficing search with V = 100. The

dashed curve shows the distribution for the noninteracting case. The solid gray curve
corresponds to ne ff = 10, and the solid black one is for neff = 100. The area under
each curve is one.
Since all points are available to be selected, the probability that an agent op-
erating with a heuristic discrimination level of a will select a state with value v
is
V\ (a/3)3
p(a, v) — v MC& = ( (10)
Eu=o rnu at' v j (1+ a/3)v
To finally obtain the distribution of values returned by the agents, this must be
integrated over the distribution of a values. When hints are exchanged, this pa-
rameter will be distributed lognormally with a mean p and standard deviation a
depending on the corresponding values for the hint fractions. The result can be
written as
V vii+Kva)2 /2] ico die—(t2/2) 0. + ert+vcr21-crt) —V
(11)
P(v) = 1 Tr (v)e -CO
where ;I = p — ln(3).
The distributions are compared in Figure 4 for the case in which the initial
agents' heuristic has a = 1.5 (i.e., a bit better than random value discrimination)
and the hint fractions are distributed according to N(1, 0.05), again giving a case
in which the hints, on average, neither help nor hinder the search. In this case, the
top 0.1 percentile level is at a value v = 52 when neff = 10 and v = 70 when
neff = 100. This compares with the noninteracting case in which this performance
level is at v = 48.
220 Tad Hogg
CONCLUSION
The effectiveness of the hints exchanged among the agents discussed in the previ-
ous sections, depended critically on how independently they were able to prune the
search space. At one extreme, when all the agents use the same technique, the hints
will not provide any additional pruning. Similarly, if the various agents randomly
search through the space and only report the nodes which they have already ex-
amined, this will not significantly help the other agents. More specifically, highly
structured problems can be rapidly addressed by relatively simple direct algorithms.
Although various processes may run in parallel, the structure will allow the decom-
position to be such that each agent provides an independently needed part of the
answer. This would give no possibility of (and no need for) improvement with hints.
On the other hand, in highly disordered problems, each part of the search space will
be unrelated to other parts, giving little or no possibility of transfering experience
among the agents, and hence exponentially long solution times.
Many interesting problems, such as those requiring adaptive response to the
physical world or finding entries in large databases relevant to various users, are
intermediate in nature. Although simple direct algorithms do not exist, these prob-
lems nevertheless have a large degree of redundancy thus enabling the transfer of
results between different parts of the search. It is just this redundancy which al-
lows for the existence of effective heuristics and various techniques which can be
exploited by collections of cooperative processes. In particular, the fact that most
constraints in a search involve only a few of the choices at a time gives an effective
locality to most of the interactions among allowed choices. More fundamentally, this
characteristic can be viewed as a consequence of intermediate stable states required
for the design or evolution of the systems dealt with in these problems.17
The examples considered in the previous sections have shown quantitatively
how systems which effectively deal with this class of problems (e.g., economic and
biological communities as well as distributed computer systems currently under
development) can benefit from the exchange of hints. For topological structures
with sharp phase transitions in behavior, performance can be further enhanced
when the exchange of hints allow even a few agents to reach the transition point.
In summary, this provides a connection between complexity measures and actual
performance for interacting computational processes.
There remain a number of interesting open issues. The examples presented
above ignored the fact that hints will actually arrive over time, presumably im-
proving as other agents spend more time in their individual searches. On the other
hand, the usefulness of the hints to the recipient process could decline as it pro-
gresses with its own search, filling in specific details of a solution. Thus, in more
realistic models, the hint pruning fractions frnit will depend on the current state
of agents i and j, giving rise to a range of dynamical behaviors. In addition, the
examples neglected any variation in cost (in terms of computer time) with the de-
gree to which hints were effectively used. More generally, the effectiveness of a hint
could depend on how much time is spent constructing it (e.g., presenting it in a
most general context where it is more likely to be applicable) and analyzing it. Such
variation could be particularly important in satisficing searches where additional
time devoted to improving hints means fewer states can be examined. Finally, over
longer time scales, as the system is applied to a range of similar problems, there is
the possibility of increased diversity among the agents as they record those strate-
gies and hints which proved most effective. Thus, in addition to showing that the
most diverse systems are best able to address difficult problems, this opens the pos-
sibility of studying the gradual development of specialized agents and the resulting
improvement in performance.
ACKNOWLEDGMENTS
During the course of this work, I have benefited from many conversations with B.
Huberman, J. Kephart, and S. Stornetta.
222 Tad Hogg
REFERENCES
1. Aitchison, J., and J. A. C. Brown. The Log-Normal Distribution. Cambridge:
Cambridge Univ. Press, 1957.
2. Bennett, C. H. "Dissipation, Information, Computational Complexity and the
• Definition of Organization." In Emerging Syntheses in Science, edited by D.
Pines. Santa Fe, NM: Santa Fe Institute, 1986, 297-313.
3. Bollobas, B. Random Graphs. New York: Academic Press, 1985.
4. Chaitin, G. "Randomness and Mathematical Proof." Sci. Am. 232 (1975):47-
52.
5. Crow, Edwin L., and Kunio Shimizu, editors. Lognormal Distributions: The-
ory and Applications. New York: Marcel Dekker, 1988.
6. Crutchfield, J. P., and K. Young. "Inferring Statistical Complexity." Phys.
Rev. Lett. 63 (1989):105-108.
7. Huberman, B. A., and T. Hogg. "Complexity "and Adaptation." Physica 22D
(1986):376-384.
8. Huberman, B. A., and T. Hogg. "Phase Transitions in Artificial Intelligence
Systems." Artificial Intelligence 33 (1987):155-171.
9. Huberman, Bernardo A., and Tad Hogg. "The Behavior of Computational
Ecologies." In The Ecology of Computation, edited by B. A. Huberman.
Amsterdam: North Holland, 1988, 77-115.
10. Kephart, J. 0., T. Hogg, and B. A. Huberman. "Dynamics of Computational
Ecosystems." Phys. Rev. A 40 (1989):404-421.
11. Kolmogorov, A. N. "Three Approaches to the Quantitative Definition of Ran-
domness." Prob. of Info. Thins. 1 (1965):1-7.
12. Krebs, C. J. Ecology. New York: Harper and Row, 1972.
13. Montroll, E. W., and M. R. Shlesinger. "On 1/f Noise and Other Distribu-
tions with Long Tails." Proc. Natl. Acad. Sci. (USA) 79 (1982):3380-3383.
14. Pearl, J. Heuristics: Intelligent Search Strategies for Computer Problem Solv-
ing. Reading, MA: Addison-Wesley, 1984.
15. Schockley, W. "On the Statistics of Individual Variations of Productivity in
Research Laboratories." Proc. of the IRE 45 (1957):279-290.
16. Sedgewick, R. Algorithms. New York: Addison-Wesley, 1983.
17. Simon, H. The Sciences of the Artificial. Cambridge, MA: MIT Press, 1962.
18. Solomonoff, R. "A Formal Theory of Inductive Inference." Info. 6 Control 7
(1964):1-22.
James P. Crutchfieldt and Karl Youngtt
f Physics Department, University of California, Berkeley CA 94720; 1: permanent address:
Physics Board of Studies, University of California, Santa Cruz, CA 950641]
Computation at the Onset of Chaos
Computation at levels beyond storage and transmission of information ap-

pears in physical systems at phase transitions. We investigate this phe-
nomenon using minimal computational models of dynamical systems that
undergo a transition to chaos as a function of a nonlinearity parameter. For
period-doubling and band-merging cascades, we derive expressions for the
entropy, the interdependence of E-machine complexity and entropy, and the
latent complexity of the transition to chaos. At the transition, determin-
istic finite automaton models diverge in size. Although there is no regular
or context-free Chomsky grammar in this case, we give finite descriptions
at the higher computational level of context-free Lindenmayer systems.
We construct a restricted indexed context-free grammar and its associated
one-way nondeterministic nested stack automaton for the cascade limit
language.
This analysis of a family of dynamical systems suggests a complexity the-
oretic description of phase transitions based on the informational diversity
and computational complexity of observed data that is independent of par-
ticular system control parameters. The approach gives a much more refined
(11Internet address for JPC is chaosagojira.berkeley.edu and for KY karl©gojira.berkeley.edu

224 Complexity, Entropy, and the Physics of Information
picture of the architecture of critical states than is available via correlation

functions, mutual information, and statistical mechanics generally. The an-
alytic methods establish quantitatively the long-standing observation that
significant computation is associated with the critical states found at the
border between order and chaos.
BEYOND A CLOCK AND A COIN FLIP

The clock and the coin flip signify the two best understood behaviors that a phys-
ical system can exhibit. Utter regularity and utter randomness are the dynamical
legacy of two millenia of physical thought. Only within this century, however, has
their fundamental place been established. Today, realistic models of time-dependent
behavior necessarily incorporate elements of both.
The regularity and Laplacian determinism of a clock are fundamental to much
of physical theory. Einstein's careful philosophical consideration of the role of time
is a noteworthy example. The use of a mechanical device to mark regular events is
the cornerstone of relativity theory.121 A completely predictable system, which we
shall denote by Pt, is essentially a clock; the hands indicate the current state and
the mechanism advances them to the next state without choice. For a predictable
system some fixed pattern is repeated every (say) t seconds.
Diametrically opposed, the coin flip, a picaresque example of ideal randomness,
is the basic model underlying probability and ergodic theories. The next state in
such a system is statistically independent of the preceding and is reached by exer-
cising maximum choice. In ergodic theory, the formal model of the coin flip is the
Bernoulli flow Bt, a coin flip every t seconds.
We take Bt and R as the basic processes with which to model the complexity
of nonlinear dynamical systems. In attempting to describe a particular set of ob-
servations, if we find that they repeat, then we can describe them as having been
produced by some variant of R. Whereas, if they are completely unpredictable,
then their generating process is essentially the same as Bt. Any real system S,
of course, will contain elements of both and so naturally we ask whether it is al-
ways the case that some observed behavior can be decomposed into these separate
components. Is S = Bt 0 R? Both ergodic and probability theories say that in
general this cannot be done so simply. Ornstein showed that there are ergodic sys-
tems that cannot be separated into completely predictable and completely random
processes.56 The Wold-Kolmogorov spectral decomposition states that although the
frequency spectrum of a stationary process consists of a singular spectral component
associated with periodic and almost periodic behavior and a broadband continuous
Mit is not an idle speculation to wonder what happens to Einstein's universe if his dock contains
an irreducible element of randomness, or more realistically, if it is chaotic.
Computation at the Onset of Chaos 225
component associated with an absolutely continuous measure, there remain other

statistical elements beyond these 43,44,69
What is this other behavior, captured neither by clocks nor by coin flips? A
partial answer comes from computation theory and is the subject of the following.
The most general model of deterministic computation is the universal Turing
machine (UTM).[3) Any computational aspect of a regular process like Pt can be
programmed and so modeled with this machine. In order that the Turing machine
readily model processes like Bt, we augment it with a random register whose state
it samples with a special instructionN The result is the Bernoulli-Turing machine
(BTM). It captures both the completely predictable via its subset of deterministic
operations and the completely unpredictable by accessing its stochastic register.
If the data is completely random, a BTM models it most efficiently by guessing.
A BTM reads and prints the contents of its "Bernoulli" register, rather than im-
plementing some large deterministic computation to generate pseudo-random num-
bers. What are the implications for physical theory? A variant of the Church-Turing
thesis is appropriate: the Bernoulli-Turing machine is powerful enough to describe
even the "other stuff" of ergodic and probability theories.
Let us delve a little further into these considerations by drawing parallels. One
goal here is to infer how much of a data stream can be ascribed to a certain set
of models {Bt, Pt}. This model basis induces a set of equivalences in the space of
stationary signals. Thus, starting with the abstract notions of strict determinism
and randomness, we obtain a decomposition of that space. A quantity that is con-
stant in each equivalence class is an invariant of the modeling decomposition. Of
course, we are also interested in those cases where the model basis is inadequate;
where more of the computational power of the BTM must be invoked. When this
occurs, it hints that the model basis should be expanded. This will then refine the
decomposition and lead to new invariants.
An analogous, but restricted type of decomposition is also pursued formally
in ergodic and computation theories by showing how particular examples can be
mapped onto one another. The motivations are that the structure of the decompo-
sition is a representation of the defining equivalence concept and, furthermore, the
latter can be quantified by an invariant. A classic problem in ergodic theory has
been to identify those systems that are isomorphic to Bt. The associated invariant
used for this is the metric entropy, introduced into dynamical systems theory by
Kolmogorov46,46 and Sinai66 from Shannon's information theory.63 Two Bernoulli
processes are equivalent if they have the same entropy.56 Similarly, in computa-
tion theory there has been a continuing effort to establish an equivalence between
[3]This statement is something of an article of faith that is formulated by the Church-Turing

Thesis: any reasonably specifiable computation can be articulated as a program for a UTM.38
14] This register can also be modeled with a second tape containing random bits. In this case,
the resulting machine is referred to as a "Random Oracle" Turing Machine." What we have in
mind, although formally equivalent, is that the machine in question is physically coupled to an
information source whose bits are random with respect the computation at hand. Thus, we do not
require ideal random bits.
various hard-to-solve, but easily verified, problems. This is the class of nondeter-
ministic polynomial (NP) problems. If one can guess the correct answer, it can be
verified as such in polynomial time. The equivalence between NP problems, called
NP-completeness, requires that within a polynomial number of TM steps a prob-
lem can be reduced to one hardest problem.34 The invariant of this polynomial-time
reduction equivalence is the growth rate, as a function of problem size, of the com-
putation required to solve the problem. This growth rate is called the algorithmic
complexity.151
The complementarity between these two endeavors can be made more explicit
when both are focused on the single problem of modeling chaotic dynamical systems.
Ergodic theory is seen to classify complicated behavior in terms of information
production properties, e.g., via the metric entropy. Computation theory describes
the same behavior via the intrinsic amount of computation that is performed by
the dynamical system. This is quantified in terms of machine size (memory) and
the number of machine steps to reproduce behavior .(6) It turns out, as explained in
more detail below, that this type of algorithmic measure of complexity is equivalent
to entropy. As a remedy to this we introduce a complexity measure based on BTMs
that is actually complementary to the entropy.
The emphasis in the following is that the tools of each field are complemen-
tary and both approaches are necessary to completely describe physical complexity.
The basic result is that if one is careful to restrict the class of computational mod-
els assumed to be the least powerful necessary to capture behavior, then much of
the abstract theory of computation and complexity can be constructively imple-
mentedil From this viewpoint, phase transitions in physical systems are seen to
support high levels of computation. And conversely, computers are seen to be phys-
ical systems designed with a subset of "critical" degrees of freedom that support
computational fluctuations.
The discussion has a top-down organization with three major parts. The first,
consisting of this section and the next, introduces the motivations and general
formalism of applying computational ideas to modeling dynamical systems. The
second part develops the basic tools of e-machine reconstruction and a statistical
mechanical description of the machines themselves. The third part applies the tools
to the particular class of complex behavior seen in cascade transitions to chaos. A
few words on further applications conclude the presentation.
Min fact, the invariant actually used is a much coarsened version of the algorithmic complexity:
a polynomial time reduction is required only to preserve the exponential character of solving a
hard problem.
[81 We note that computation theory also allows one to formalize how much effort is required to
infer a dynamical system from observed data. Although related, this is not our present concern.27
vim the highest computation level of universal Turing machines, descriptions of physical com-
plexity are simply not constructive since finding the minimal TM program for a given problem is
undecidable in genera1.38
C FIGURE 1 The complexity spectrum: complexity C

as a function of the diversity of patterns. The latter
is measured with the (normalized) Shannon entropy
H. Regular data have low entropy; very random data
0 have maximal entropy. However, their complexities
0 H 1 are both low.
CONDITIONAL COMPLEXITY
The basic concept of complexity that allows for dynamical systems and computation
theories to be profitably linked relies on a generalized notion of structure that we
will refer to generically as "symmetry." In addition to repetitive structure, we also
consider statistical regularity to be one example of symmetry. The idea is that a
data set is complex if it is the composite of many symmetries.
To connect back to the preceding discussion, we take as two basic dynamical
symmetries those represented by the model basis {Bt,Pt}. A complex process will
have, at the very least, some nontrivial combination of these components. Simply
predictable behavior and purely random behavior will not be complex. The corre-
sponding complexity spectrum is schematically illustrated in Figure 1.
More formally, we define the conditional complexity C(DIS) to be the amount
of information in equivalence classes induced by the symmetry S in the data D plus
the amount of data that is "unexplained" by S. If we had some way of enumerating
all symmetries, then the absolute complexity C(D) would be
inf C(DIS).
C(D) = {s
}
And we would say that an object is complex if, after reduction, it is its own sym-
metry. In that case, there are no symmetries in the object, other than itself.181 If
D is the best model of itself, then there is no unexplained data, but the model is
large: C(DID) cc length(D). Conversely, if there is no model, then all of the data
MOr, said another way, the complex object is only described by a large number of equivalence
classes induced by inappropriate symmetries. The latter can be illustrated by considering an
inappropriate description of a simple object. A square wave signal is infinitely complex with
respect to a Fourier basis. But this is not an intrinsic property of square waves, only of the choice
of model basis. There is a model basis that gives a very simple description of a square wave.
is unexplained: C(D10) oc length(D). The infimum formalizes the notion of consid-

ering all possible model "bases" and choosing those that yield the most compact
descriptioni9l
This definition of conditional modeling complexity mirrors that for algorithmic
randomness%10,47,48,50 and is closely related to computational approaches to induc-
tive inference.67 A string s is random if it is its own shortest UTM description. The
latter is a complexity measure called the Chaitin-Kolmogorov complexity K(s) of
the string s. In the above notation K(s) = C(s(UT AI). The class of "symmetries"
referred to here are those computable by a deterministic UTM. After factoring
these out, any residual "unidentified" or "unexplained" data is taken as input to
the UTM program. With respect to the inferred symmetries, this data is "noise."
It is included in measuring the size of the minimal UTM representation. K(s) mea-
sures the size of two components: an emulation program and input data to that
emulation. To reconstruct s, the UTM first reads in the program portion in order
to emulate the computational part of the description. This computes the inferred
symmetries. The (emulated) machine then queries the input tape as necessary to
disambiguate indeterminant branchings in the computation of s.
K(s) should not be confused with the proposed measure of physical complexity
based on BTMs, C(sIBTM), which include statistical symmetries. There is, in fact,
a degeneracy of terminology here that is easily described and avoided.
Consider the data in question to be an orbit xt(zo) of duration t starting at
state zo of a dynamical system admitting an absolutely continuous invariant mea-
sure.Dol The algorithmic complexity17 A(xt(zo)) is the growth rate of the Chaitin-
Kolmogorov complexity with longer orbits
(
K(xzo))
A (xt (x0 )) = thnoi.
Note that this artifice removes constant terms in the Chaitin-Kolmogorov com-
plexity, such as those due to the particular implementation of the UTM, and gives
a quantity that is machine independent. Then, the algorithmic complexity is the
dynamical system's metric entropy, except for orbits starting at a measure zero set
of initial conditions. These statements connect the notion of complexity of single
strings with that of the ensemble of typical orbits. The Chaitin-Kolmogorov com-
plexity is the same as informational measures of randomness, but is distinct from the
BTM complexity.inl To avoid this terminological ambiguity we shall minimize ref-
erences to algorithmic and Chaitin-Kolmogorov complexities since in most physical
PiThis computational framework for modeling also applies, in principle, to estimating symbolic
equations of motion from noisy continuous data.21 Generally, minimization is an application of
Occam's Razor in which the description is considered to be a "theory" explaining the data.42
Rissanen's minimum-description-length principle, the coding theoretic version of this philosophical
axiom, yields asymptotically optimal representations.61'62
110 In information theoretic terms we are requiring stationarity and ergodicity of the source.
[111We are necessarily skipping over a number of details, such as how the state rt is discretized
into a string over a finite alphabet. The basic point made here has been emphasized some time
ago .8,17
situations they measure the same dynamical property captured by the informa-
tion theoretic phrase "entropy." "Complexity" shall refer to conditional complexity
with respect to BTM computational models. We could qualify it further by using
"physical complexity," but this is somewhat misleading since it applies equally well
outside of physics.421
We are not aware of any means of enumerating the space of symmetries and so
the above definition of absolute complexity, while of theoretical interest, is of little
immediate application. Nonetheless, we can posit that symmetries S be effectively
computable in order to be relevant to scientific investigation. According to the
physical variant of the Church-Turing thesis then, S can be implemented on a
BTM. Which is to say that as far as realizability is concerned, the unifying class of
symmetries that we have in mind is represented by operations of a BTM. Although
the mathematical specification for a BTM is small, its range of computation is vast
and at least as large as the underlying UTM. It is, in fact, unnecessarily powerful
so that many questions, such as finding a minimal program for given data, are
undecidable and many quantities, such as the conditional complexity C(DIBTM),
are noncomputable. More to the point, adopting too general a computational model
results in there being little to say about a wide range of physical processes.
Practical measures of complexity are based on lower levels of Chomsky's com-
putational hierarchy.1131 Indeed, Thring machines appear only at the pinnacle of
this graded hierarchy. The following concentrates on deterministic finite automata
(DFA) and stack automata (SA) complexity, the lowest two levels in the hierarchy.
DFAs represent strictly clock and coin-flip modeling. SAs are DFAs augmented by
an infinite memory with restricted push-down stack access. We will demonstrate
how DFA models break down at a chaotic phase transition and how higher levels
of computational model arise naturally. Estimating complexity types beyond SAs,
such as linear bounded automata (LBA), is fraught with certain intriguing difficul-
ties and will not be attempted here. Nonetheless, setting the problem context as
broadly as we have just done is useful to indicate the eventual goals that we have
in mind and to contrast the present approach to other long-standing proposals that
UTMs are the appropriate framework with which to describe the complexity of
natural processes.(14) Even with the restriction to Chomsky's lower levels, a good
deal of progress can be made since, as will become clear, contemporary statistical
mechanics is largely associated with DFA modeling.
1123This definition of complexity and its basic properties as represented in Figure 1 were presented
by the first author at the International Workshop on "Dimensions and Entropies in Chaotic
Systems," Pecos, New Mexico, 11-16 September 1985.
1131Further development of this topic is given elsewhere 28'29
[141We have in mind Kolmogorov's work48 over many years that often emphasizes dynamical and
physical aspects of this problem. Also, Bennett's notion of "logical depth" and his analysis of phys-
ical processes typically employ UTM models.5 Wolfram's suggestion" that the computational
properties of intractability and undecidability will play an important role in future theoretical
physics assumes UTMs as the model basis. More recently, Zurek" has taken up UTM descrip-
tions of thermodynamic processes. The information metric used there was also developed from a
conditional complexity.24
RECONSTRUCTING &MACHINES
To effectively measure intrinsic computational properties of a physical system we
infer an &machine from a data stream obtained via a measuring instrument.25 An
&machine is a stochastic automaton of the minimal computational power yielding
a finite description of the data stream. Minimality is essential. It restricts the scope
of properties detected in the &machine to be no larger than those possessed by the
underlying physical system. We will assume that the data stream is governed by a
stationary measure. That is, the probabilities of fixed length blocks of measurements
exist and are time-translation invariant.
The goal, then, is to reconstruct from a given physical process a computation-
ally equivalent machine. The reconstruction technique, discussed in the following,
is quite general and applies directly to the modeling task for forecasting temporal
or spatio-temporal data series. The resulting minimal machine's structure indicates
the inherent information processing, i.e., transmission and computation, of the orig-
inal physical process. The associated complexity measure quantifies the &machine's
informational size; in one limit, it is the logarithm of the number of machine states.
The machine's states are associated with historical contexts, called morphs, that are
optimal for forecasting. Although the simplest (topological) representation of an &
machine at the lowest computational level (DFAs) is in the form of labeled directed
graphs, the full development captures the probabilistic (metric) properties of the
data stream. Our complexity measure unifies a number of disparate attempts to de-
scribe the information processing of nonlinear physical systems 4'6,17,19,21,35,59,65,70
The following two sections develop the reconstruction method for the machines and
their statistical mechanics.
The initial task of inferring automata from observed data falls under the
purview of grammatical inference within formal learning theory.22 The inference
technique uses a particular choice S of symmetry that is appropriate to forecasting
the data stream in order to estimate the conditional complexity C(DIS). The aim is
to infer generalized "states" in the data stream that are optimal for forecasting. We
will identify these states with measurement sequences giving rise to the same set of
possible future sequences .P51 Using the temporal translation invariance guaranteed
by stationarity, we identify these states using a sliding window that advances one
measurement at a time through the sequence. This leads to the second step in the
inference technique, the construction of a parse tree for the measurement sequence
probability distribution. This is a coarse-grained representation of the underlying
process' measure in orbit space. The state identification requirement then leads to
an equivalence relation on the parse tree. The machine states correspond to the
induced equivalence classes; the state transitions, to the observed transitions in the
tree between the classes. We now give a more formal development of the inference
method.
[15] We note that the same construction can be done for past possibilities. We shall discuss this
alternative elsewhere.
The first step is to obtain a data stream. The main modeling ansatz is that the
underlying process is governed by a noisy discrete-time dynamical system
in+ = 14(in) + M
where M is the m-dimensional space of states, 4) = (xS, , xecT,'.1) is the sys-

tem's initial state, F is the dynamic, the governing deterministic equations of mo-
tion, and G represents external time-dependent fluctuations. We shall concentrate
on the deterministic case in the following. The (unknowable) exact states of the
observed system are translatedilg into a sequence of symbols via a measurement
channe1.2° This process is described by a parametrized partition
k-1
PE = {ci : =M, = 0,i j; i,j= o,...,k — 1}
i=o
of the state space M, consisting of cells ci of volume E" that are sampled every
r time units. A measurement sequence consists of the labels from the successive
elements of P, visited over time by the system's state. Using the instrument I =
{Pc, r}, a sequence of states {in} is mapped into a sequence of symbols {sn : sn E
A}, where A = {0,..., k — 1} is the alphabet of labels for the k(ne, Cm) partition
elements.21 A common example, to which we shall return near the end, is the logistic
map of the interval, zn4.1 = rxn(1 — an), observed with the binary generating
partition Pi = .5), (.5,1.]} whose elements are labeled with A = {0,1}.17 The
computational models reconstructed from such data are referred to as &machines
in order to emphasize their dependence on the measuring instrument I.
Given the data stream in the form of a long measurement sequence s =
{sosis2 • • • : Si E A}, the second step in machine inference is the construction of
a parse tree. A tree T = {n,1} consists of nodes n = {ni} and directed, labeled
links 1 = {4} connecting them in a hierarchical structure with no closed paths.
The links are labeled by the measurement symbols s E A. An L-level subtree Tn
is a tree that starts at node n and contains all nodes below n that can be reached
within L links. To construct a tree from a measurement sequence we simply parse
the latter for all length L sequences and from this construct the tree with links
up to level L that are labeled with individual symbols up to that time. We refer
to length L subsequences sL = {Si • • • s' . • si+L_i : s2 = (s)1} as L-cylinders.[171
Hence an L level tree has a length L path corresponding to each distinct observed
L-cylinder. Probabilistic structure is added to the tree by recording for each node
I161We ignore for brevity's sake the question of extracting from a single component {xt,i} an
adequate reconstructed state space.57
f171The picture here is that a particular L-cylinder is a name for that bundle of orbits {in} each
of which visited the sequence of partition elements indexed by the L-cylinder.
ni the number Ni(L) of occurrences of the associated L-cylinder relative to the

total number N(L) observed,
Ni(L)
P1, (L) = N(L) -
This gives a hierarchical approximation of the measure in orbit space lItimeM. Tree
representations of data streams are closely related to the hierarchical algorithm
used for estimating dynamical entropies.17,26
At the lowest computational level c-machines are represented by a class of la-
beled, directed multigraph, or 1-digraphs.3° They are related to the Shannon graphs
of information theory,63 to Weiss's sofic systems in symbolic dynamics,33 to discrete
finite automata in computation theory,38 and to regular languages in Chomsky's
hierarchy.11 Here we are concerned with probabilistic versions of these. Their topo-
logical structure is described by an 1-digraph G = {V, E} that consists of vertices
V = {vi} and directed edges E = {e1} connecting them, each of the latter is labeled
by a symbol s E A.
To reconstruct a topological c-machine we define an equivalence relation, sub-
tree similarity, denoted on the nodes of the tree T by the condition that the
L-subtrees are identical:
non' if and only if T,i1' = T,f; .

Subtree equivalence means that the link structure is identical. This equivalence
relation induces on T, and so on the measurement sequence s, a set of equivalence
classes {Cm : m = 1, , K} given by
Cf = In En:nE Cf and n' n n'l .
We refer to the archetypal subtree link structure for each class as a "morph." An
1-digraph GL is then constructed by associating a vertex to each tree node L-level
equivalence class; that is, V = {Cm}. Two vertices vk and vi are connected by
a directed edge e = (vk vi) if the transition exists in T between nodes in the
equivalence classes,
n n' : n E cf , n' E cf .
The corresponding edge is labeled by the symbol(s) s E A associated with the tree
links connecting the tree nodes in the two equivalence classes
E= = (vk , /4; s) : vk n n'; n E Cf, n' E Cf,s E AI .

In this way, c-machine reconstruction deduces from the diversity of individ-
ual patterns in the data stream "generalized states," the morphs, associated with
the graph vertices, that are optimal for forecasting. The topological c-machines so
reconstructed capture the essential computational aspects of the data stream by
virtue of the following instantiation of Occam's Razor.
THEOREM. Topological reconstruction of GL produces the minimal and unique ma-

chine recognizing the language and the generalized states specified up to L-cylinders
by the measurement sequence.
The generalization to reconstructing metric &machines that contain the proba-
bilistic structure of the data stream follows by a straightforward extension of subtree
similarity. Two L-subtrees are 5-similar if they are topologically similar and their
corresponding links individually are equally probable within some b > 0. There is
also a motivating theorem: metric reconstruction yields minimal metric &machines.
In order to reconstruct an e-machine, it is necessary to have a measure of the
"goodness of fit" for determining e, r, 5, and the level L of subtree approximation.
This is given by the graph indeterminacy, which measures the degree of ambiguity
in transitions between graph vertices. The indeterminacy20 /G of a labeled digraph
G is defined as the weighted conditional entropy
IG = E Pv p(s1v) E p(vIlv; s) log p(v'lv;s)

vEV sEA v'EV
where p(tiv; s) is the transition probability from vertex v to v' along an edge la-
beled with symbol s, p(siv) is the probability that s is emitted on leaving v, and
pt, is the probability of vertex v. A deterministically accepting E-machine is recon-
structible from L-level equivalence classes if 'GL vanishes. Finite indeterminacy, at
some given {L, e, r, b}, indicates a residual amount of extrinsic noise at that level
of approximation. In this case, the optimal machine in a set of machines consistent
with the data is the smallest that minimizes the indeterminacy.27
STATISTICAL MECHANICS OF &MACHINES

Many of the important properties of these stochastic automata models are given
concisely using a statistical mechanical formalism that describes the coarse-grained
scaling structure of orbit space. We recall some definitions and results necessary
for our calculations.25 The statistical structure of an e-machine is given by a
parametrized stochastic connection matrix
Tc, = {iv} = E 71.)

sEA
that is the sum over each symbol s E A in the alphabet
A = {i : i = 0, , k — 1; k = 0(e-m)}
of the state transition matrices

T(s) = f ea log p(oilvi;s1.
for the vertices vi E V. We will distinguish two subsets of vertices. The first Vt
consists of those associated with transient states; the second Vr, consists of recurrent
states.The a-order total Renyi entropy,6° or "free information," of the measurement
sequence up to n-cylinders is given by
Ha(n) = (1— a)-1 log Za(n)
where the partition function is
Za(n) = E ea 1°gP('')
snE{sa}
with the probabilities p(s") defined on the n-cylinders {s"}. The Renyi specific
entropy, i.e., entropy per measurement, is approximated17 from the n-cylinder dis-
tribution by
ha(n) =n-l Ha (n)
or lea(n) = Ha(n)— Ha(n — 1)
and is given asymptotically by
ha = lin' ha(n) .
n-.co
The parameter a has several interpretations, all of interest in the present con-
text. From the physical point of view, a (= 1 — 13) plays the role of the inverse
temperature /3 in the statistical mechanics of spin systems.39 The spin states corre-
spond to measurements and a configuration of spins on a spatial lattice to a tempo-
ral sequence of measurements. Just as the temperature increases the probability of
different spin configurations by increasing the number of available states, a accen-
tuates different subsets of measurement sequences in the asymptotic distribution.
From the point of view of Bayesian inference, a is a Lagrange multiplier specifying
a maximum entropy distribution consistent with the maximum likelihood distribu-
tion of observed cylinder probabilities.41 Following symbolic dynamics terminology,
a = 0 will be referred to as the topological or counting case; a = 1, as the metric
or probabilistic case or high temperature limit. Varying a moves continuously from
topological to metric machines. Originally in his studies of generalized information
measures, Renyi introduced a as just this type of interpolation parameter and noted
that the a-entropy has the character of a Laplace transform of a distribution 60 Here
there is the somewhat pragmatic, and possibly more important, requirement for a:
it gives the proper algebra of trajectories in orbit space. That is, a is necessary
for computing measurement sequence probabilities from the stochastic connection
matrix Ta. Without it, products of Ta fail to distinguish distinct sequences.
An e-machine's structure determines several key quantities. The first is the
stochastic DFA measure of complexity. The a-order graph complexity is defined as
Ca = (1 — a)'1 log E Pv
where the probabilities pv are defined on the vertices v E V of the &machine's

1-digraph. The graph complexity is a measure of an &machine's information pro-
cessing capacity in terms of the amount of information stored in the morphs. As
mentioned briefly later, the complexity is related to the mutual information of the
past and future semi-infinite sequences and to the convergence18,19 of the entropy
estimates ha(n). It can be interpreted, then, as a measure of the amount of math-
ematical work necessary to produce a fluctuation from asymptotic statistics.
The entropy and complexity are dual in the sense that the former is determined
by the principal eigenvalue Aa of T.,
ha = (1— a)—1 log2 Acr
and the latter by the associated left eigenvector of T„
f„ = {pc: : v E V}
that gives the asymptotic vertex probabilities.

The specific entropy is also given directly in terms of the stochastic connection
matrix transition probabilities
h = (1 — a)-1 log E 73: E lv; .13t, =

Pv
Pv
vEV .'ev Lst,
.EA
A complexity based on the asymptotic edge probabilities fie = {p, : e E E} can also
be defined
= (1 — a)' log E
p: .
eEE
fe is given by the left eigenvector of the &machine's edge graph. The transition
complexity GI is simply related to the entropy and graph complexity by
C:=C„+1z.
There are, thus, only two independent quantities for a finite DFA &machine.27
The two limits for a mentioned above warrant explicit discussion. For the first,
topological case (a = 0), To is the 1-digraph's connection matrix. The Renyi entropy
ho = log Ao is the topological entropy h. And the graph complexity is
Co(G) = log IVI .
This is C(sIDFA): the size of the minimal DFA description, or "program," required
to produce sequences in the observed measurement language of which s is a member.
This topological complexity counts all of the reconstructed states. It is similar to
the regular language complexity developed for cellular-automaton-generated spatial
patterns." The DFAs in that case were constructed from known equations of motion
and an assumed neighborhood template. Another related topological complexity
counts just the recurrent states V,.. The distinction between this and Co should be
clear from the context in which they are used in later sections.
In the second, metric case (a = 1), ha becomes the metric entropy
dAa
= lim = —
da •
The metric complexity
Cp =
a--•1
=— E p„ log pe,
veV
is the Shannon information contained in the morphs X18) Following the preceding re-
marks, the metric entropy is also given directly in terms of the stochastic connection
matrix
ht, =E E
p, p(t, s) log Aviv';
vEV viev
• EA
A central requirement in identifying models from observed data is that a par-

ticular inference methodology produces a sequeace of hypotheses that converge to
the correct one describing the underlying process.27 The complexity can be used
as a diagnostic for this since it is a direct measure of the size of the hypothesized
stochastic DFA at a given reconstruction cylinder length. The identification method
outlined in the preceding section converges with increasing cylinder length if the
rate of change of the complexity vanishes. If, for example,
2c..(L)
cc, =lim
L—.00 L
vanishes, then the noisy dynamical system has been identified. If it does not vanish,
then cc, is a measure of the rate of divergence of the model size and so quantifies
a higher level of computational complexity. In this case, the model basis must be
augmented in an attempt to find a finite description at some higher level. The fol-
lowing sections will demonstrate how this can happen. A more complete discussion
of reconstructing various hierarchies of models is found elsewhere.29
Pg) Cf. "set complexity" version of the regular language complexity35 and "diversity" of undirected,
unlabeled trees.4
PERIOD-DOUBLING CASCADES
To give this general framework substance and to indicate the importance of quan-
tifying computation in physical processes, the following sections address a concrete
problem: the complexity of cascade transitions to chaos. The onset of chaos often
occurs as a transition from an ordered (solid) phase of periodic behavior to a dis-
ordered (gas) phase of chaotic behavior. A cascade transition to chaos consists of a
convergent sequence of individual "bifurcations," either pitchfork (period-doubling)
in the periodic regimes or band-merging in the chaotic regimes.P1
The canonical model class of these transitions is parametrized two-lap maps of
the unit interval, xn+i = f(xn), xn E [0,1], with negative Schwartzian derivative,
that is, those maps with two monotone pieces and admitting only a single attractor.
We assign to the domain of each piece the letters of the binary alphabet E {0, 1).
The sequence space E* consists of all 0-1 sequences. Some of these maps, such as
the piecewise-linear tent map described in a later section, need not have the period-
doubling portion of the cascade. Iterated maps are canonical models of cascade
transitions in the sense that the same bifurcation sequence occurring in a set of
nonlinear ordinary differential equations (say) is topologically equivalent to that
found in some parametrized map.12,32,37
Although e-machines were developed in the context of reconstructing computa-
tional models from data series, the underlying theory provides an analytic approach
to calculating entropies and complexities for a number of dynamical systems. This
allows us to derive in the following explicit bounds on the complexity and entropy
for cascade routes to chaos. We focus on the periodic behavior near pitchfork bifur-
cations and chaotic behavior at band mergings with arbitrary basic periodicity.14,15
In distinction to the description of universality of the period-doubling route to
chaos in terms of parameter variation,31 we have found a phase transition in com-
plexity that is not explicitly dependent on control parameters.25 The relationship
between the entropy and complexity of cascades can be said to be super-universal
in this sense. This is similar to the topological equivalence of unimodal maps of
the interva1,13,36,51,52,55 except that it accounts for statistical and computational
structures associated with the behavior classes.
In this and the next sections we derive the total entropy and complexity as a
function of cylinder length n for the set of e-machines describing the behavior at
the different parameter values for the period-doubling and band-merging cascades.
The sections following this, then, develop several consequences, viz. the order and
the latent complexity of the cascade transition. With these statistical mechanical
results established, the discussion turns to a detailed analysis of the higher level
computation at the transition itself.
1191 The latter are not, strictly speaking, bifurcations in which an eigenvalue of the linearized
problem crosses the unit circle. The more general sense of bifurcation is nonetheless a useful
shorthand for qualitative changes in behavior as a function of a control parameter.
In the periodic regime below the periodicity q = 1 cascade transition, we find

the e-machines for m-order period-doubling 2m --* 2172+1 (m = 0,1,2,3) shown in
Figures 2-5.
FIGURE 2 Topological I-digraph for

period 1 attractor.

period 2 attractor.

period 4 attractor.

period 8 attractor.
For periodic behavior the measure on the n-cylinders {sa} is uniform; as is

the measure on the recurrent &machine states V,. Consider behavior with period
P = q x 2"1 at a given m-order period-doubling with basic cascade periodicity q.
The uniformity allows us to directly estimate the total entropy in terms of the
number N(n, m) of n-cylinders with n > P
Ha(n,m)= (1— cr)-1 log E e(e)

te.E{s4}
= (1— a)-1 log ivN. a(n
(n7m))
= log N(n,m)
For periodic behavior and assuming n > P, the number of n-cylinders is given by
the period N(n, m) = P. The total entropy is then Ha(n,m) = log P. Note that, in
this case, ha vanishes.
Similarly, the complexity is given in terms of the number V,. = MI of recurrent
states
Ca=(1—a)-'log E
vor
Pv
= (1 - a)-1 log V,..1-a
= log
The number V,. of vertices is also given by the period for periodic behavior and so
we find Ca = log P. Thus, for periodic behavior the relationship between the total
and specific entropies and complexity is simple
Ca = Ha
or Ca = nh„(n)
This relationship is generally true for periodic behavior and is not restricted to the
situation where dynamical systems have produced the data. Where noted in the
following we will also use Co = log to measure the total number of machine
states.
CHAOTIC CASCADES
In the chaotic regime the situation is much more interesting. The &machines at
periodicity q = 1 and m-order band-mergings 2' —+ 21' 1, m = 0, 1,2,3, are shown
in Figures 6-9.
FIGURE 6 Topological (-
digraph for single band chaotic
attractor.
FIGURE 7 Topological !-
digraph for 2 --► 1 band chaotic
attractor.
FIGURE 8 Topological !-
digraph for 4 --* 2 band chaotic
attractor.
FIGURE 9 Topobgical [-
digraph for 8 —4 4 band chaotic
attractor.
The graph complexity is still given by the number V,. of recurrent states as
above. The main analytic task comes in estimating the total entropy. In contrast
to the periodic regime the number of distinct subsequences grows with n-cylinder
length for all n. Asymptotically, the growth rate of this count is given by the specific
topological entropy. In order to estimate the total topological entropy at finite n,
however, more careful counting is required than in the periodic case. This section
develops an exact counting technique for all cylinder lengths that applies at chaotic
parameter values where the orbit fa(e) of the critical point x*, where f(e) = 0,
is asymptotically periodic. These orbits are unstable and embedded in the chaotic
attractor. The set of such values is countable. At these (Misiurewicz) parameters
there is an absolutely continuous invariant measure s4
There is an additional problem with the arguments used in the periodic case.
The uniform distribution of cylinders no longer holds. The main consequence is
that we cannot simply translate counting N(n, m) directly into an estimate of
licroo(n,m). One measure of the degree to which this is the case is given by the
difference in the topological entropy h and the metric entropy /2.0.17
Approximations for the total Renyi entropy can be developed using the exact
cylinder-counting methods outlined below and the machine state and transition
probabilities from {TS:)). The central idea for this is that the states represent a
Markov partition of the symbol sequence space E*. There are invariant subsets of
E*, each of which converges at its own rate to "equilibrium." Each subset obeys
the Shannon-McMillan theorem? individually. At each cylinder length each subset
is associated with a machine state. And so the growth in the total entropy in each
subset is governed by the machine's probabilistic properties. Since the cylinder-
counting technique captures a sufficient amount of the structure, however, we will
not develop the total Renyi entropy approximations here and instead focus on the
total topological entropy.
We now turn to an explicit estimate of N(n, m) for various cases. Although
the techniques apply to all Misiurewicz parameters, we shall work through the
periodicity q = 1, 2 1, 4 —). 2, and 1 --* 0 band-merging transitions (Figures 6-8)

in detail, and then quote the general formula for arbitrary order of band-merging.
The tree for 2 --,. 1 band merging n-cylinders is shown in Figure 10.
An exact expression for N(n, 1) derives from splitting the enumeration of unique
n-cylinders as represented on the tree into recurrent and transient parts. For two
bands, Figure 10 illustrates the transient spine, the set of tree nodes associated
with transient graph states, while schematically collapsing that portion of the tree
associated with asymptotic graph vertices. The latter is shown in Figure 11. As
will become clear the structure of the transient spine in the tree determines the
organization of the counting method.
The sum for the nth level, i.e., for the number of n-cylinders, is
LH 1.1V-J
N(n,l) =1+ E 2' + > 2i
i=0 i=0
where 1k] is the largest non-negative integer less than k. The second term on the
right counts the number of tree nodes that branch at even numbered levels, the third
term is the number that branch at odd levels, and the first term counts the transient
spine that adds a single cylinder. For n > 2 and even, this can be developed into a
renormalized expression that yields a closed form as follows
FIGURE 10 Parse tree

associated with two chaotic
bands merging into one.
Tree nodes are shown for
the transient spine only. The
subtrees associated with
asymptotic behavior, and so
also with the equivalence
classes corresponding to
recurrent graph vertex 1
in Figure 7, are indicated
schematically with triangles.
FIGURE 11 Subtree of
nodes associated with
asymptotic vertices in (-
digraph for two bands
merging to one.
-2
N(n,l) =1+ 2 E
=0
i=o
2i
= 1+2(1+2 Ei=o - 2 -210)
= 1 + 2(N(n, l) - 21)
or N(n,1)= 2 (211 - 2-1)
For n > 2 and odd, we find N(n,1) = 3 . 2(n-1)/2 - 1. This gives an upper bound
on the growth envelope as a function of n. The former, a lower bound.
The analogous expression for the 4 --t• 2 band cylinder count can be explicitly
developed. Figure 12 shows the transient spine on the tree that determines the
counting structure. In this case, the sum is
In7 SJ
N(n,2) = 2 + 211Vi E y+ y+ E y+ E .
i=o e=o i=o i=o
There are seven terms on the right-hand side. In order they account for
1. The two transient cycles, begun on 0 and 1, each of which contributes 1 node
per level;
2. Cycles on the attractor that are fed into the attractor via non-periodic tran-
sients (second and third terms); and
3. Sum over tree nodes that branch by a factor of 2 at level k + 4i, k = 3,4,5,6,
respectively.
FIGURE 12 Transient
spine for 4 2 band
attractor. The asymptotic
subtrees are labeled with
the associated I-digraph
vertex. (Compare Figure 8.)
The sum greatly simplifies upon resealing the indices to obtain a self-similar form.
For n > P = 4 and n= 4i, we find
N(n,2)=2(1+21V+E2i +E2i
i=o i=o
.:;4
=2+4 (1+E2i
i=o
=2+4 (1.+2E2i -2-1)

i=o
= 2 -I- 2 (N(n,2) — 211
or N(n,2) = 222i — 2
There are three other phases for the upper bound as a function of n.
For completeness we note that this approach also works for the single band
(m = 0) case
n-1
N(n,0)= 1+ E
i=o
n-1
= 1 + ± 2 E - 2n — 1)
i=0
= 2N (n, 0) -
or N(n,0)= 2"
The preceding calculations were restricted by the choice of a particular phase
of the asymptotic cycle at which to count the cylinders. With a little more effort
a general expression for all phases is found. Noting the similarity of the l-digraph
structures between different order band-mergings and generalizing the preceding
recursive technique yields an expression for arbitrary order band-merging. This
takes into account the fact that the generation of new n-cylinders via branching
occurs at different phases on the various limbs of the transient spine. The number
of n-cylinders from the exact enumeration for the q = 1, 2m 2m-1 band-merging
is
m=0
N(n, m) = { 2m (bnon2n2— - 2-4) m # 0
where n > P = 2m and bnon = (1 + /)2" and ñ = 2-m(n mod 2m) account for
the effect of relative branching phases in the spine. This coefficient is bounded
bmin = inf bn m = 1
{n,m$0}
b.,. = sup bn,m = 3 • 2-1 41.0606602
{n,m#0}
The second bound follows from noting that the maximum occurs when, for example,
n = 2m + 2m-1. Note that the maximum and minimum values of the prefactor
are independent of the phase and of n and m. We will ignore the detailed phase
dependence and simply write b instead of bn,,n and consider the lower bound case
of b = 1.
Recalling that Co = log = m, we have
N(n) = 2c0 (b2"2-c° 2-1)

and the total (topological) entropy is given by
Ho(n) = log2 N(n)

Ho(n) = Co + log2 (2"2-c° -
where we have set b = 1. The first term recovers the linear interdependence that
derives from the asymptotic periodicity; cf. the period-doubling case. The second
term is due to the additional feature of chaotic behavior that, in the band-merging
case, is reflected in the branching and transients in the 1-digraph structure. In
terms of the modeling decomposition introduced at the beginning, the first term
corresponds to the periodic process Pt and the branching portion of the second
term, to components isomorphic to the Bernoulli process Bt.
From the development of the argument, we see that the factor 2' in the
exponent controls the branching rate in the asymptotic cycle and so should be
related to the rate of increase of the number of cylinders. The topological entropy
is the growth rate of Ho and so can now be determined directly
ho(m) = lim H°(n) = 2' .
Rewriting the general expression for the lower bound in a chaotic cascade makes it
clear how ho controls the total entropy
N(n, m) = 'V,. (2nh —2-1)
where h = f/V,. is the branching ratio of the number of vertices f that branch to
the total number V,. of recurrent states.
The above derivation used periodicity q = 1. For general periodicity band-
merging, we have V, = q • 2'n and f = 1. It is clear that the expression works
for a much wider range of e-machines with isolated branching within a cycle that
do not derive from cascade systems. Indeed, the results concern the relationship
between eigenvalues and asymptotic state probabilities in the family of labeled
Markov chains with isolated branching among cyclic recurrent states.
As a subset of all Misiurewicz parameter values, band-merging behavior has the
simplest computational structure. In closing this section, we should point out that
there are other cascade-related families of Misiurewicz parameters whose machines
are substantially more complicated in the sense that the stochastic element is more
than an isolated branching. Each family is described by starting with a general
labeled Markov chain as the lowest-order machine. The other family members are
obtained by applications of a period-doubling operator.12 Each is a product of
a periodic process and the basic stochastic machine. As a result of this simple
decomposition, the complexity-entropy analysis can be carried out. This will be
reported elsewhere. It explains many of the complexity-entropy properties above the
lower bound case of band-merging. The numerical experiments later give examples
of all these types of behavior.
CASCADE PHASE TRANSITION

The preceding results are used in this section to demonstrate that the cascade
route to chaos has a complexity-entropy phase transition. It was established some
time ago that this route to chaos is a phase transition as a function of a nonlin-
earity parameter,31 with an external (dis)ordering field" and a natural (dis)order
parameter.16 Here we focus on the information-processing properties of this tran-
sition. First, we estimate for finite cylinder lengths the complexity and specific
entropy at the transition. Second, we define and compute the transition's latent
complexity that gives the computational difference between c-machines above and
below the transition. Finally, we discuss the transition's order.
Given the lower bound expressions for the entropy and complexity above and
below the transition to chaos as a function of cylinder length n, we can easily
estimate the complexities C (n) and C" (n) and the critical entropy Hc(n). Figure
13 gives a schematic representation of the transition and shows the definitions of
the various quantities. The transition is defined as the divergence in the slope of
the chaotic branch of the complexity-entropy curve. That is, the critical entropy .1-k
and complexity C' are defined by the condition
OH
=0
ac
From this, we find
C' = log2 n — log2 loge y
nfic = C' + log2 (by — 2-1)
where y= Z
..n2—
ci is the solution of
1
y log. y—y+ — = 0,
2
FIGURE 13 Complexity versus specific entropy

estimate. Schematic representation of the
cascade lambda transition at finite cylinder
lengths. Below He the behavior is periodic;
0/ , above, chaotic. The latent complexity is given
0 1 by the difference of the complexities C" and
Periodic HC Chaotic
C' at the transition on the periodic and chaotic
(Solid) (Gas)
branches, respectively.
that is, yP.', 2.155535035. Numerical solution for n = 16 gives

C'(16) 1 3.851982
C"(16) 4.579279
Hc(16) Pe. 0.286205
at b = 1.
The latent complexity AC of the transition we define as the difference at the
critical entropy H. of the complexities on the periodic and chaotic branches
AC = C" — .
Along the periodic branch the entropy and complexity are equal, and so from the
previous development we see that
nHc = C" = + log2 (by
or AC = log2 (by — 1 ) .
For b = 1 this gives by numerical solution

AC 0.7272976887 bit
which, we note, is independent of cylinder length.
In classifying this transition thermodynamically, the complexity plays the role
of heat capacity. It is by our definition a computational "capacity." Just as the ther-
modynamic temperature controls the multiplicity of available states, H appears as
an "informational" temperature and Ha as a critical amount of information (en-
ergy) per symbol (spin) at which long-range fluctuations occur. The overall shape
is then similar to a lambda phase transition in that there is a gradual increase in
the capacity from both sides and a jump discontinuity in it at the transition. The
properties supporting this follow from the bounds developed earlier. And so, there
is at least one component of the cascade transition that is a second-order transi-
tion, i.e., that associated with periodicity q = 1. There is also a certain degeneracy
due to the phase dependence of the coefficient bnon. This is a small effect, but it
does indicate a range of different limiting values as n oo for the chaotic critical
complexity C'. It does not change the order of the transition. To completely char-
acterize the transition, though, an upper bound on complexity at fixed n is also
needed. This requires accounting for the typical chaotic parameters, by which we
mean those associated with aperiodic behavior of the critical point. An approach
to this problem will be reported elsewhere.
It should also be emphasized that the above properties were derived for finite
cylinder lengths, that is, far away from the thermodynamic limit of infinite cylin-
ders. The overall shape and qualitative properties hold not only in the thermody-
namic limit but also at each finite size. In the thermodynamic limit the entropy
estimates 71-1H(n) go over to the entropy growth rates ha. As a result, all of the pe-
riodic behavior lies on the ha = 0 line in the (hc„ Ca)-plane. This limiting behavior
is consistent with a zero-temperature phase transition of a one-spatial-dimension
spin system with finite interaction range.
This analysis of the cascade phase transition should be contrasted with the
conventional descriptions based on correlation function and mutual information
decay. The correlation length of a statistical mechanical system is defined most
generally as the minimum size L at which there is no qualitative statistical difference
between the system of size L and the infinite (thermodynamic limit) system. This
is equivalent in the present context to defining a correlation length La at which L-
cylinder a-order statistics are close to asymptotic.1201 If we consider the total entropy
Ha(L) as the (dis)order parameter of interest, then for finite e-machines,1211 away
from the transition on the chaotic side, we expect its convergence to asymptotic
statistics to behave like
2H-(L) oc 2r .
But for L sufficiently large
oc 2h.z.
where ha = log2 Aa. By this argument, the correlation length is simply related
to the inverse of the specific entropy: La oc h;;1. We would conclude, then, that
the correlation function description of the phase transition is equivalent in many
respects to that based on specific entropy.
Unfortunately, this argument, which is often used in statistical mechanics, con-
fuses the rate of decay of correlation with the correlation length. These quantities
are proportional only assuming exponential decay or, in the present case, assuming
finite &machines. The argument does indicate that as the transition is approached
the correlation length diverges since the specific entropy vanishes. For all behav-
ior with zero metric entropy, periodic or exactly at the transition, the correlation
length is infinite. As typically defined, it is of little use in distinguishing the various
types of zero-entropy behavior.
The correlation length in statistical mechanics is determined by the decay of
the two-point autocorrelation function
C(L) = (sisi+L) = — E (sisi+L — Si)

i=o
Its information theoretic analog is the two-point 1-cylinder mutual information
Ia(si,si+L)= Ha (Si) — Ha (si+Lisi)
1201Cf.
the entropy "convergence knee" na. 19
PilThe statistical mechanical argument, from which the following is taken, equivalently assumes
exponential decay of the correlation function.
where si is the ith symbol in the sequence s and Ha(•) is the Renyi entropy J221
Using this to describe phase transitions is an improvement over the correlation
function in that, for periodic data, it depends on the period P : Ic, cc log P. In
contrast, the correlation function in this case does not decay and gives an infinite
correlation length.
The convergence of cylinder statistics to their asymptotic (thermodynamic
limit) values is most directly studied via the total excess elltrOpY18'25'35'58
Fa(L)=Ha(L)—ha L.
It measures the total deviation from asymptotic statistics, up to L-cylinders.(23) As

L oo, it measures the average mutual information between semi-infinite past
and future sequences. It follows from standard information theoretic inequalities
that the two-point 1-cylinder mutual information is an underestimate of the excess
entropy and so of the convergence properties. In particular,
L' I Fo(L) > Io (si, si+L)

since lc, ignores statistical dependence on the symbols between si and si÷L. The
DFA c-machine complexity is directly related to the total excess entropy25
Ca(L) ccL—cc, Fa(L) .
As a tool to investigate computational properties, the two-point mutual information

is too coarse, since it gives at best lower bound on the DFA complexity.
At the transition correlation extends over arbitrarily long temporal and spa-
tial scales and fluctuations dominate. It is the latter that support computation at
higher levels in Chomsky's hierarchy. The computational properties at the phase
transition are captured by the diverging c-machines' structure. To the extent that
their computational structure can be analyzed, a more refined understanding of the
phase transition can be obtained.
CASCADE LIMIT LANGUAGE

The preceding section dealt with the statistical character of the cascade transi-
tion, but we actually have much more information available from the c-machines.
Although the DFA model diverges in size, its detailed computational properties
at the phase transition reveal a finite description at a higher level in Chomsky's
122IThe correlation length is most closely related to 12.

1231 A scaling theory for entropy convergence to the thermodynamic limit that includes the effect
of extrinsic noise has been described previously."
hierarchy. With this we obtain a much finer classification than is typical in phase
transition theory.
The structure of the limiting machine can be inferred from the sequence of
machines reconstructed at 2" 2m+1 period-doubling bifurcation on the periodic
side and from those reconstructed at 2"' -+ 2m-1 band-merging on the chaotic side.
(Compare Figures 2 and 6, 3 and 7, 4 and 8, 5 and 9.) All graphs have transient
states of pair-wise similar structure, except that the chaotic machines have a period
2m-1 unstable cycle. All graphs have recurrent states of period 2'. In the periodic
machines this cycle is deterministic. In the chaotic machines, although the states
are visited deterministically, the edges have a single nondeterministic branching.
The order of the phase transition depends on the structural differences between
the (-machines above and below the transition to chaos. In general, if this structural
difference alters the complexity at constant entropy, then the transition will be
second order. At the transition to chaos via period doubling there is a difference in
the complexities due to
1. The single vertex in the asymptotic cycle that branches; and
2. The transient 2m-1 cycle in the machines on the chaotic side.
At constant complexity the uncertainty developed by the chaotic branching and
the nature of the transient spine determine the amount of dynamic information
production required to make the change from predictable to chaotic (-machines.
The following two subsections summarize results discussed in detail elsewhere.
CRITICAL MACHINE
The machine M that accepts the sequences produced at the transition, although
minimal, has an infinite number of states. The growth of machine size IV(L)I versus
reconstruction cylinder size L at the transition is demonstrated in Figure 14. The
maximum growth is linear with slope co = 3. Consequently, the complexity diverges
logarithrnically.P41 The growth curve itself is composed of pieces with alternating
slope 2 and slope 4
IV(L)I =
4L 3 - 21-1 < L <2' 1-1
The slope 2 learning regions correspond to inferring more of the states that link
the upper and lower branches of the machine. (The basic structure will be made
clearer in the discussion of Figure 15 below.) The slope 4 regions are associated
with picking up pairs of states along the long deterministic chains that are the
upper and lower branches. Recalling the definition of c0, in a previous section, we
note that finite co indicates a constant level of complexity using a more powerful
computational model than IPt,130.-
1241The total entropy also depends logarithmically on cylinder length.

FIGURE 14 Growth of critical

machine M. The number
1V(L)1 of reconstructed states
versus cylinder length L for the
logistic map at the periodicity
q = 1 cascade transition.
Reconstruction is from length
L = 1 to length L = 64
0 L 64 cylinders on 2L-cylinder trees.
Self-similarity of machine structure at the limit is evident if the machine is dis-

played in its "dedecorated" form. A portion of the infinite 1-digraph at the transition
is shown in Figure 15 in this form. A decoration of an 1-digraph is the insertion
of a deterministic chain of states between two states." In a dedecorated 1-digraph
chains of states are replaced with a single edge labeled with the equivalent symbol
sequence. In the figure structures with a chain followed by a single branching have
been replaced with a single branching each of whose edges are labeled with the orig-
inal symbol sequence between the states. The dedecoration makes the self-similarity
in the infinite machine structure readily apparent.
The strict regularity in the limit machine structure indicates a uniformity in the
underlying computation modeled at a higher level. Indeed, the latter can be inferred
from the infinite machine by applying the equivalence class morph reconstruction
algorithm to the machine itself.1251 The result is the non-DFA machine Mc shown
in Figure 16, where the states in dedecorated M (Figure 15) are coalesced into new
equivalence classes based on the subtree similarity applied to the sequence of state
transitions. The additional feature that must be inferred, once this higher level
machine is reconstructed, is the production rule for the edge labels. These describe
strings that double in length according to the production B BB; where B is a
register variable and B' is the contents of B with the last symbol complemented.
The production appends to the register's contents the string B'
On a state transition the contents of the register are output either directly or
as the string B'. The 1-digraph edges are labeled accordingly in the figure. On a
(25)The general framework for reconstructing machines at different levels in a computational hi-
erarchy is presented elsewhere.29
FIGURE 15 Self-similarity of machine structure at cascade limit is shown in the

dedecorated I-digraph of M.
FIGURE 16 Higher level

production-rule machine, or
stack automaton, Mc that
accepts Lc.
transition from states signified by squares, the register production is performed first
and then the transition is made. The machine begins in the start state with a "1"
in the register.
Me accepts the full language Le produced at the transition including the tran-
sient strings with various prefixes. At its core, though, is the simple recursive pro-
duction (B BB') for the itinerary to, of the critical point z". We will now explore
the structure of this sequence in more detail in order to see just what computational
capabilities it requires. We shall demonstrate how and where it fits in the Chomsky
hierarchy.
CRITICAL LANGUAGE AND GRAMMAR
Before detailing the formal language properties of the symbol sequences generated
at the cascade transition, several definitions of restricted languages are in order.
First, of course, is the critical language itself Lc which we take to be the set of
all subsequences produced asymptotically by the dynamical system at the cascade
transition. Me is a deterministic acceptor of Lc. Second, the most restricted lan-
guage, denoted L1 , is the sequence of the itinerary of the map's maximum e. That
is,
L1 = {w, : w = sis2s3 • • • and t (z.) < z. si = 0, otherwise si = 1)

a single sequence. Third, a slight generalization of this, L2, consists of all length r
subwords of we that start at the first symbol
L2 = : = SiS2 • ' • S2i I = 0, 1,2, ... and si = [tve]i}
where Pk = sk if w = si s2s3 - • •sk • ••• Finally, we define L3 to be the set of

subsequences of any length that start at the first symbol of to,
L3 = : w = SiS2 • • • Si, i = ... and si = [tvc]j} .
Note that L, is the further generalization including subsequences that start at any
symbol in we
Lc = {wk : wk = 6162 - • si, i = 1, 2,3, ... and = [wc]i+k ,k > (:)} .
With these various languages, we can begin to delineate the formal properties of
the transition behavior. First, we note that an infinite number of words occur in Lc
even though the metric entropy is zero. Additionally, there are an infinite number
of inadmissible sequences and so an infinite number of words in the complement
language Lc, i.e., words not in Lc. One consequence is that the transition is not
described by a subshift of finite type since there is no finite list of words whose
concatenation generates Lc .3
Second, in formal language theory, "pumping lemmas" are used to prove that
certain languages are not in some language class3s Typically this is tantamount to
demonstrating that particular recurrence or cyclic properties of the class are not
obeyed by sufficiently long words in the language in question. Regular languages
(RL) are those accepted by DFAs. Using the pumping lemma for regular languages,
it is easy to demonstrate that L E L2, L3, Lc } is not regular. This follows from
noting that there is no length n such that each word z E L with Uzi > n can be
broken into three subwords, z = uvw with luv I < n, where the middle (nonempty)
subword can be repeated arbitrarily many times. That is, sufficiently long strings
cannot be decomposed such that z E L = uew E L Vi > 0. In fact, no substrings
can be arbitrarily pumped. The lack of such a cyclic property also follows from
noting that in M all the states are transient and there are no transient cycles.
The observation of this structural property also leads to the conclusion that Lc is
also not finitely described at the next level of the complexity hierarchy: context-
free languages (CFL), i.e., those accepted by push-down automata. This can be
established directly using the pumping lemma for context-free languages.
Third, in the structural analysis of M, we found states at which the following
production is applied: A —+ AA, where A' = so • —4 if A = so • • • sk and is
the complement of s. This production generates L1 and L2. It is most concisely
expressed as a context-free Lindenmayer system.49 The general class is called OL
grammars: G = {E, P, a} consisting of the symbol alphabet, production rules, and
start string, respectively. This computational model is a class of parallel rewrite
automata in which all symbols in a word have the production rules simultaneously
applied, with the neighboring symbols playing no role in the selection of which
production. The symbol alphabet is E = {O, 1}. The production rules P are quite
simple P = {0 --+ 11,1 10} and start with the string a = {1}. This system
generates the infinite sequence L1 and allowing the choice of when to stop the
productions, it generates L2 = {l,10,1011,10111010, ...}.
Although the L-system model of the transition behavior is quite simple, as a
class of models its parallel nature is somewhat inappropriate. L-systems produce
both "early" and "late" symbols in a string at every production step, whereas the
dynamical system in question produces symbols sequentially. This point is even
more obvious when these symbol sequences are considered as sequential measure-
ments. The associated L-system model would imply that the generating process had
an infinite memory of past measurements and accessed them arbitrarily quickly. The
model class is too powerful.
This can be remedied by converting the OL-system to its equivalent in the
Chomsky hierarchy of sequential computation as The Chomsky equivalent is a re-
stricted indexed context-free grammar Ge = {N, I, T, F, P, S}. 1 A central feature
of the indexed grammars is that they are a natural extension of the context-free
languages that allow for a limited type of context sensitivity via indexed produc-
tions, while maintaining properties of context-free languages, such as closure and
decidability, that are important for compilation. For the limit language the com-
ponents are defined as follows. N = {S, T} is the set of nonterminal variables with
S as the start symbol; I = {A, B, C, D, E, F} is the set of intermediate variables;

T = {0,1} is the set of terminal symbols;
P = {S Tg,T • BA,C BB, D BA, E --0 0,F
is the set of productions; and F = {f, g} with f = [A --. C, B D] and g =

[A E, B -4 F] are indexed productions. The grammar just given is in its "nor-
mal" form since the variables in the indexed productions F do not have productions
in P. The indexed grammar is restricted in that there are no intermediate variables
with productions that produce new indices. The latter occurs only via the S Tg
and T f productions. Note that once this is no longer used, via the application
of T -4 BA, no new indices appear.
The above indexed grammar sequentially produces symbols in words from L2.
Two example "left-most" derivations are
S Tg -4 BgAg (1)
- FAg lAg --0 1E 10
S -+Tfg BfgAfg (2)
- DgAfg BgAgAfg FAgAfg
- lAgAfg lEAfg 10Af g
-4 10Cg 10BgBg 10FBg
- 101F 1011
Productions are applied to the left-most nonterminal in each step. Consequently, the
terminal symbols {0,1} are produced sequentially left to right in "temporal" order.
In the first line, notice how the indices distribute over the variables produced by the
production T -4 BA. When an indexed production is used an index is consumed:
as in Bg F in going from the first to the second line above.
All of the languages in the Chomsky hierarchy have dual representations as
grammars and as automata. The machine corresponding to an indexed context-free
language is the nested stack automaton (NSA).2 This is a generalization of the push-
down automaton: a finite-state control augmented with a last-in first-out memory
or stack. An NSA has the additional ability to move into the stack in a read-only
mode and to insert a new (nested) stack at the current stack symbol being read.
It cannot move higher in the stack until it has finished with the nested stack and
removed it. The restricted indexed context-free grammar for L2 is recognized by
the one-way nondeterministic NSA (1NNSA) shown in Figure 17. The start state
is q. The various actions label the state transition edges. $ denotes the top of the
current stack and the cent sign, the current stack bottom. The actions are one of
three forms:
1. a 13, where a and # are patterns of symbols on the top of the current stack;
2. a {1,-1}, where the latter indicates moving the head up and down the stack,
respectively, upon seeing the pattern a at current stack top.
3. (t,$t) (1,$), where t is a symbol read off of the input tape and compared
to the symbol at the top of the stack. The "1" indicates that the input head
advances to the next symbol on the input tape. The symbol on the stack's top
is removed: $t $.
In all but one case the actions are in the form of a symbol pattern on the top of
the stack leading to a replacement pattern and a stack head motion. The notation
on the figure uses a component-wise shorthand. For example, the productions are
implemented on the transition labeled ${S,T,T,C,D,E,F} ${Tg,TI,BA,BB,BA,0,1}
which is shorthand for the individual transitions: $S $Tg, $1 --0 $Tf, $T $BA,
$C $BA, $0 —+ $BB, $E $0, and $F $1. The operation of the 1NNSA mimics
the derivations in the indexed grammar. The nondeterminism here means that there
exists some set of transitions that will accept words from L2. L3 is accepted by the
same 1NNSA, but modified to accept when the end of the input string is reached
and the previous input has been accepted.
There are three conclusions to draw from these formal language results. First,
it should be emphasized that the particular details in the preceding analysis are not
essential. Rather, the most important remark is that the description at this higher
level is finite and, indeed, quite small. Despite the infinite DFA complexity, a simple
higher-level description can be found once the computational model is augmented.
Indeed, the deterministic Turing machine program to generate words in the limit
language is simple: (i) copy the current string on the tape onto its end and (ii) invert
the last bit. The limit language for the cascade transition uses little of the power
of the indexed grammars. The latter can recognize, for example, context-sensitive
languages. The limit machine is thus exceedingly weak in its implied computational
structure. Also, the only nondeterminism in the 1NNSA comes from anticipating the
length of the string to accept; a feature that can be replaced to give a deterministic
and so less powerful automaton.
Second, it is relatively straightforward to build a continuous-state dynamical
system with an embedded universal Turing machine.r261 With this in mind, and for
its own sake, we note that by the above construction the cascade transition does not
have universal computation embedded in it. Indeed, it barely aspires to be much
more than a context-free grammar. With the formal language analysis we have
bounded the complexity at the transition to be greater than regular and context-
free languages and no more powerful than indexed context-free. Furthermore, the
complexity at this level is measured by a linearly bounded DFA growth rate co = 3.
These properties leave open the possibility, though, that the language could be a
one-way nondeterministic stack automaton (1NSA).38
Finally, we demonstrated by an explicit analysis that nontrivial computation,
beyond information storage and transmission, arises at a phase transition. One
is forced to go beyond DFA models to the higher stack automaton level since the
(261A two-dimensional map with an embedded 4-symbol, 7-state, universal Turing machines3 was
constructed 23
(O,SO),(1,$1) —> (1,$)

S{S,T,T,C,D,E,F} —>${1-g,Tf,BA,BB,BA,0,1}
Se -->
{A,B,f,g,e} —> -1
S{fA} —> $
{SM} —> (Sf,f)SCC {$f,f}—> {$f,f}SDC

(Sf,f)--> (Sf,f)SEc —> {$f,f}SFc
$A—>$ $B —> FIGURE 17 One-way

nondeterministic nested stack
automaton for limit languages
S{A,B,c},{A,B,c} —> 1 ${A,B,c},{A,B,e} —> 1 L2 and L3.
former require an infinite representation. These properties are only hinted at by the
infinite correlation length and the slow decay of two-point mutual information at
the transition.
LOGISTIC MAP
The preceding analysis holds for a wide range of nonlinear systems since it rests only
on the symbolic dynamics and the associated probability structure. It is worthwhile,
nonetheless, to test it quantitatively on particular examples. This is possible because
it rests on a (re)constructive method that applies to any data stream. This section
and the next report extensive numerical experiments on two one-dimensional maps
The first is the logistic map, defined shortly, and the second, the piecewise linear
tent map.
The logistic map is a map of the unit interval given by
x„4.1 = rZn(1 — xn), ZO E [0,1] and r E [0,4]
where the parameter r controls the degree of nonlinearity. r/4 is the map's height
at its maximum e=
1/2. This is one of the simplest, but nontrivial, nonlinear
dynamical systems. It is an extremely rich system about which much is known.12
It is fair to say, however, that even at the present time there are still a number of
unsolved mathematical problems concerning the behavior at arbitrary chaotic pa-
rameter values. The (generating) measurement partition is P1,2 = [0, .5), [.5,1.]}.
The machine complexity and information theoretic properties of this system
have been reported previously.25 Figure 18 shows the complexity versus specific
Cl
FIGURE 18 Observed complexity
versus specific entropy estimate for
the logistic map at 193 parameter
values r E [3,4] within both periodic
and chaotic regimes. Estimates on
32-cylinder trees with 16-cylinder
subtree machine reconstruction,
H(16)/16 1 where feasible.
entropy for 193 parameter values r E [3,4]. One of the more interesting general
features of the complexity-entropy plot is clearly demonstrated by this figure: all
of the periodic behavior lies below the critical entropy I-1c; and all of the chaotic,
above. This is true even if the periodic behavior comes from cascade windows of
periodicity q > 1 within the chaotic regime at high parameter values. The (14, Ca)
plot, therefore, captures the essential information processing, i.e., computation and
information production, in the period-doubling cascade independent of any explicit
system control.
The lower bound derived in the previous sections applies exactly to the periodic
data (H < HO and to the band-merging parameter values. The fit to the periodic
data is extremely accurate, verifying the linear relationship except for high periods
beyond that resolvable at the chosen reconstruction cylinder length. The fit in the
chaotic regime is also quite good. (See Figure 19.) The data are systematically
lower (-2%) in entropy due to the use of the topological entropy in the analysis.
The measured critical entropy 14 and complexity C" at the transition were 0.28
and 4.6, respectively.
TENT MAP
The second numerical example, the tent map, is in some ways substantially simpler
than the logistic map. It is given by
axn xn < 1/2

°a." = a(1 — xn ) xn > 1/2
FIGURE 19 Fit of logistic map

periodic and chaotic data to
corresponding functional forms. The
data is from the periodicity 1 band-
merging cascade and also includes
all of the periodic data found in the
preceding figure. The theoretical
curves Co(Ho) are shown as solid
lines.
where the parameter a controls the height (= a/2) of the map at the maximum
x* = 1/2. The main simplicity is that there is no period-doubling cascade and,
for that matter, there are no stable periodic orbits, except at the origin for a < 1.
There is instead only a periodicity q = 1 chaotic band-merging cascade that springs
from x* at a = 1.
The piecewise linearity also lends itself to further analysis of the dynamics. Since
the map has the same slope everywhere, the Lyapunov exponent A, topological,
metric, and Renyi specific entropies are all equal and given by the slope A = ha =
log2 a. We can simply refer to these as the specific entropy. From this, we deduce
that, since ha = 2-"' for 2in -+ 2m-t band-mergings, the parameter values there
are
= 22-m .
For L > 2"1 the complexity is given by the band-merging period. And this, in turn,
is given by the number of bands. Thus, we have Ca = — log2 ha or
Ca = — log2 log2 a
as a lower bound for a > 1 and L > T" at an m-order band merging.
Since there is no stable periodic behavior, other than period one, there is a
forbidden region in the complexity-entropy plot below the critical entropy. The
system cannot exist at finite "temperatures" below except at absolute zero
=0.
Figure 20 gives the complexity-entropy plot for 200 parameter values a E [1, 2].
There is a good deal of structure in this plot beyond the simple band-merging lower
bounds that we have concentrated on. Near each band-merging complexity-entropy
point, there is a slanted cluster of points. These are associated with families of
parameter values at which the iterates r (e) are asymptotically periodic of various
periods. We shall discuss this structure elsewhere, except to note here that it also
appears in the logistic map, but is substantially clearer in this example.
Figure 21 shows band-merging data estimated from 16 and 20 cylinders along
with the appropriate theoretical curves for those and in the thermodynamic limit
(L = 256).
FIGURE 20 Tent map

complexity versus entropy
at 200 parameter values
a E [1,4 The quantities
were estimated with 20-cylinder
reconstruction on 40-cylinder
trees, where feasible.
6 0
0
Cl Li
la A FIGURE 21 Effect of cylinder

length. Tent map data at 16 and
20 cylinders (triangle and square
tokens, respectively) along with
theoretical curves Co(Ho) for the
0 same and in the thermodynamic
0 H(L)/Li 1 limit (L = 256).
Prom the two numerical examples, it is clear that the theory quite accurately
predicts the complexity-entropy dependence. It can be easily extended in several
directions. Most notably, the families of Misiurewicz parameters associated with
unstable asymptotically periodic maxima can be completely analyzed. And this
appears to give some insight into the general problem of the measure of parameter
values where iterates of the maximum are asymptotically aperiodic. Additionally,
the computational analysis is being applied to transitions to chaos via intermittency
and via frequency-locking.
COMPUTATION AT PHASE TRANSITIONS

To obtain a detailed understanding of the computational structure of a phase tran-
sition, we have analyzed one example of a self-similar family of attractors. The
period-doubling cascade is just one of many routes to chaos. The entire family is of
nominal interest, providing a rather complete analysis of a phase transition and how
statistical mechanics applies. More importantly, for general phase transitions the
approach developed here indicates a type of superuniversality that is based only on
the intrinsic information processing performed by a dynamical or, for that matter,
physical system. This information processing consists of conventional communica-
tion theoretic quantities, that is, the storage and transmission of information, and
the computational aspects, most clearly represented by the detailed structure and
formal language properties of reconstructed e-machines.
By analyzing in some detail a particular class of nonlinear systems, we have at-
tempted to strengthen the conjecture that it is at phase transitions where high-level
computation occurs. Application to other examples, such as the phase transition
in the 2-D Ising spin system and cellular automata and lattice dynamical systems
generally, will go even further toward establishing this general picture. These appli-
cations will be reported elsewhere. Nonetheless, it is clear that computational ideas
provide a new set of tools for investigating the physics of phase transitions. The
central conclusion is that via reconstruction they can be moved out of the realm of
mathematics and theoretical computer science and applied to the scientific study
and engineering of complex processes.
The association of high-level computation and phase transitions is not made
in isolation. Indeed, we should mention some early work addressing similar ques-
tions. Type IV cellular automata (CA) were conjectured by Wolfram to support
nontrivial and perhaps universal computation.71 These CA exhibit long-lived tran-
sients and propagating structures out of which elementary computations can be
constructed. The first author and Norman Packard of the University of Illinois
conjectured some years ago that type IV behavior was generated by CA on bifur-
cation sets in the discretized space of all CA rules. This was suggested by studies
of bifurcations in a continuous-state lattice dynamical system as a function of a
nonlinearity parameter.22 The continuous local states were discretized to give CA
with varying numbers of states. By comparing across a range of state-discretization

and nonlinearity parameters, the CA bifurcation sets were found to be associated
with bifurcations in the continuous-state lattice system. More recent work by Chris
Langton of Los Alamos National Laboratory has confirmed this in substantially
more detail via Monte Carlo sampling of CA rule space. This work uses mutual
information, not machine complexity, measures of the behavior. As pointed out
above, there is an inequality relating these measures. Mutual information versus
entropy density plots for hundreds of CA rules reveal a phase transition structure
similar to that shown in the complexity-entropy diagram of Figure 18. Seen in
this light, the present paper augments these experimental results with an analytic
demonstration of what appears to be a very general organization of the space of
dynamical systems, whether discrete or continuous.
COMPLEXITY OF CRITICAL STATES

Recall that the Wold-Kolmogorov spectral decomposition says the spectrum of a
stationary signal has three components 43,44,69 The first is a singular measure con-
sisting of 8-functions. This describes periodic behavior. The second component is
associated with an absolutely continuous invariant measure and so broadband power
spectra. The final component is unspecified and typically ignored. From the preced-
ing investigation, though, we can make a comment and a conjecture. The comment
is that finite stochastic DFA &machines capture the first two components of the
decomposition: Pt and Bt , respectively. The conjecture, and perhaps more inter-
esting remark, is that the third component appears to be associated with higher
levels in the computational hierarchy. This conjecture can be compared to recent
discussion of ergodic theory. Ornstein suggested that most "chaotic systems that
arise naturally are abstractly the same as Bt "56 We can now see in what sense
this can be true. If it is only at accumulation points of bifurcations that infinite
DFA machines occur, then in the space of all dynamical systems the dimension-
ality of the set of such systems will be reduced and so the set's measure will be
zero. From this viewpoint, high-complexity (non-Be) systems would be rare. We
can also see how the conjecture can be false. If there is some constraint restricting
the space of systems in which we are interested, then with respect to that space
infinite machines might have positive measure and so be likely. Systems, such as
those found in biology, that survive by adaptively modifying themselveS to better
model and forecast their environment would tend to exhibit high levels of complex-
ity. Within the space of successful adaptive systems, high complexity presumably
is quite probable. Another sense in which Ornstein's conjecture is too simplified is
that the Bernoulli shift is computationally simple. It is equivalent, in one represen-
tation, to the piecewise linear baker's transformation of the torus. In contrast, most
"natural" physical systems are modeled with smooth (non-piecewise-linear) nonlin-
earities. The associated physical properties contribute substantially to a system's
ability to generate complex behavior independent of the complexity of boundary

and initial conditions.26 Physical systems governed by a chaotic piecewise linear
dynamic simply excavate microscopic fluctuations, amplifying them to determine
macroscopic behavior.64 This information transmission across scales is computa-
tionally trivial. It is not the central property of complex behavior and structure.
We have seen that away from the cascade phase transition it is only at band-
merging parameters where chaotic behavior can be factored into periodic and
Bernoulli components. A similar decomposition of e-machines into periodic and
finite stochastic components occurs at Misiurewicz parameters, where fn(z*) is
asymptotically periodic and unstable and the invariant measure is absolutely con-
tinuous. But these parameter values are countable. Furthermore, the measure of
chaotic parameter values as one approaches a Misiurewicz value is positive and so
is uncountable.° Typical parameters in this set appear to be characterized by ape-
riodic (e) that are in a Cantor set not containing e. Such a case is modeled
by infinite DFA E-machines. Taken together these indicate that a large fraction of
"naturally arising" chaotic systems are isomorphic neither to Bt nor to Bt Pt.
Computational ergodic theory, the application of computational analysis in er-
godic theory, would appear to be a useful direction in which to search for rigorous,
refined classifications of complex behavior. This suggests, for example, the use of
DFA and SA complexity, and other higher forms, as invariants to distinguish further
the behavior of dynamical systems, such as K-flows and zero-entropy flows. For ex-
ample, although described by a minimal DFA with a single state, the inequivalence
of the binary B2(1/2, 1/2) and ternary B3(1/3, 1/3, 1/3) Bernoulli shifts follows
from the fact that they are not structurally equivalent. Not only do the entropies
differ, but the machine for the first has two edges; for the second, three. Restating
this in entropy-complexity notation
although Ca(B2) = Ca(B3) = 0,
ha(B2) ha (B3 )
and CC(B2) CC(B3) .
The full entropy-complexity plane appears as a useful first step toward a more
complete classification. Recall the (Ha, Ca) plots for the logistic and tent maps.
(See Figures 18 and 20.) Similar types of structural distinctions and new invariants
will play a central role in computational ergodic theory.
But what does this have to say about physical systems? In what sense can a
turbulent fluid, a noisy Josephson junction, or a quantum field, be said to perform
a computation? The answer is that while computational aspects appear in almost
any process, since we can always estimate some low-level e-machine, only nontrivial
computation occurs in physical systems on the order-disorder border. Additionally
these systems have very special phase-transition-like, or "critical," subspaces. e-
machine theory gives a much more refined description of such critical behavior
than that currently provided by statistical mechanics. Indeed, the well-known phase
transitions should be re-examined in this light. In addition to a more detailed
structural theory of the dynamic mechanisms responsible for phase transitions, such
studies will give an improved understanding of the macroscopic thermodynamic

properties required for computation.
Computers are, in this view, physical systems designed to be in a critical state.
They are constructed to support arbitrarily long time correlations within certain
macroscopic "computational" degrees of freedom. This is achieved by decoupling
these degrees of freedom from error-producing heat bath degrees of freedom. Com-
puters are physical systems designed to be in continual phase transition within
entropic-disordered environments. From the latter they derive the driving force
that moves computations forward. But, at the same time, they must shield the
computation from environmentally induced fluctuations.
As already emphasized, the general measure of complexity introduced at the
beginning is not limited to stochastic DFAs and SAs, but applies in principle to any
computational level or, indeed, to any modeling domain where a "symmetry" can be
factored out of data. In particular, this can be done hierarchically as we have shown
in the analysis of the computational properties of the cascade critical machine. In
fact, a general hierarchical reconstruction is available.29 The abstract definition of
complexity applies to all levels of the Chomsky hierarchy where each computation
level represents, in a concrete way, a class of symmetries with respect to which
observed data is to be "expanded" or modeled. This notion of complexity is also
not restricted to the Chomsky hierarchy. It can be applied, for example, to spatially
extended or network dynamical systems. Since these are computationally equivalent
to parallel and distributed machines, respectively, c-machine reconstruction suggests
a constructive approach to parallel computation theory. We hope to return to these
applications in the near future.
ACKNOWLEDGMENTS
The authors have benefited from discussions with Charles Bennett, Eric Friedman,
Chris Langton, Steve Omohundro, Norman Packard, Jim Propp, Jorma Rissanen,
and Terry Speed. They are grateful to Professor Carson Jeffries for his continuing
support. The authors thank the organizers of the Santa Fe Institute Workshop on
"Complexity, Entropy, and Physics of Information" (May 1989) for the opportunity
to present this work, which was supported by ONR contract N00014-86-K-0154.
REFERENCES
1. Aho, A. V. "Indexed Grammars - An Extension of Context-Free Grammars."
J. Assoc. Comp. Mach. 15 (1968):647.
2. Aho, A. V. "Nested Stack Automata." J. Assoc. Comp. Mach. 16 (1969):383.
3. Alekseyev, V. M., and M. V. Jacobson. "Symbolic Dynamics." Phys. Rep. 25
(1981):287.
4. Bachas, C. P., and B. A. Huberman. "Complexity and Relaxation of Hierar-
chical Structures." Phys. Rev. Lett. 57 (1986):1965.
5. Bennett, C. H. "Thermodynamics of Computation - A Review." Intl. J.
Theor. Phys. 21 (1982):905.
geneous Locally-Interacting Systems." Found. Phys. 16 (1986):585.
7. Blahut, R. E. Principles and Practice of Information Theory. Reading, MA:
Addison-Wesley, 1987.
8. Brudno, A. A. "Entropy and The Complexity of the Tiajectories of a Dynam-
ical System." Trans. Moscow Math. Soc. 44 (1983):127.
9. Chaitin, G. "On the Length of Programs for Computing Finite Binary Se-
quences." J. ACM 13 (1966):145.
10. Chaitin, G. "Randomness and Mathematical Proof." Sci. Am. May (1975):
47.
11. Chomsky, N. "Three Models for the Description of Language." IRE Trans.
Info. Theory 2 (1956):113.
12. Collet, P., and J.-P. Eckmann. Maps of the Unit Interval as Dynamical Sys-
tems. Berlin: Birkhauser, 1980.
13. Collet, P., J. P. Crutchfield, and J.-P. Eckmann. "Computing the Topological
Entropy of Maps." Comm. Math. Phys. 88 (1983):257.
14. Crutchfield, J. P., and B. A. Huberman. "Fluctuations and the Onset of
Chaos." Phys. Lett. 77A (1980):407.
15. Crutchfield, J. P., J. D. Farmer, N. H. Packard, R. S. Shaw, G. Jones, and R.
Donnelly. "Power Spectral Analysis of a Dynamical System." Phys. Lett. 76A
(1980):1.
16. Crutchfield, J. P., J. D. Farmer, and B. A. Huberman. "Fluctuations and
Simple Chaotic Dynamics." Phys. Rep. 92 (1982):45.
17. Crutchfield, J. P., and N. H. Packard. "Symbolic Dynamics of One-Di-
mensional Maps: Entropies, Finite Precision, and Noise." Intl. J. Theor.
Phys. 21 (1982):433.
18. Crutchfield, J. P., and N. H. Packard. "Noise Scaling of Symbolic Dynam-
ics Entropies." Evolution of Order and Chaos, edited by H. Haken. Berlin:
Springer-Verlag, 1982, 215.
19. Crutchfield, J. P., and N. H. Packard. "Symbolic Dynamics of Noisy Chaos."
Physica 7D (1983):201.
20. Crutchfield, J. P. Noisy Chaos. University of California, Santa Cruz, pub-
lished by University Microfilms Intl., Minnesota, 1983.
21. Crutchfield, J. P., and B. S. McNamara. "Equations of Motion from a Data

Series." Complex Systems 1 (1987):417.
22. Crutchfield, J. P., and K. Kaneko. "Phenomenology of Spatio-Temporal
Chaos." Directions in Chaos, edited by Hao Bai-lin. Singapore: World Scien-
tific, 1987, 272.
23. Crutchfield, J. P. "Miring Dynamical Systems." Preprint, 1987.
24. Crutchfield, J. P. "Information and Its Metric." Nonlinear Structures in Phys-
ical Systems-Pattern Formation, Chaos and Waves, edited by L. Lam and
H. C. Morris. Berlin: Springer-Verlag, 1989.
25. Crutchfield, J. P., and K. Young. "Inferring Statistical Complexity." Phys.
Rev. Lett. 63 (1989):105.
26. Crutchfield, J. P., and J. Hanson. "Chaotic Arithmetic Automata." In prepa-
ration.
27. Crutchfield, J. P. "Inferring the Dynamic, Quantifying Physical Complexity."
In Quantitative Measures of Complex Dynamical Systems, edited by N. B.
Abraham. New York: Plenum Press, 1989.
28. Crutchfield, J. P. "Compressing Chaos." In preparation.
29. Crutchfield, J. P. "Reconstructing Language Hierarchies." Infromation Dy-
namics, edited by H. A. Atmanspracher. New York, NY: Plenum Press, 1990.
30. Cvetkovic, D. M., M. Doob, and H. Sachs. Spectra of Graphs. New York: Aca-
demic Press, 1980.
31. Feigenbaum, M. J. "The Universal Metric Properties of Nonlinear Transfor-
mations." J. Stat. Phys. 21 (1979):669.
32. Feigenbaum, M. J. "Universal Behavior in Nonlinear Systems." Physica 7D
(1983):16.
33. Fischer, R. "Sofic Systems and Graphs." Monastsh. Math 80 (1975):179.
34. Garey, M. R., and D. S. Johnson. Computers and Intractability: A Guide to
the Theory of NP-Completeness. New York: W. H. Freeman, 1979.
35. Grassberger, P. "Toward a Quantitative Theory of Self-Generated Complex-
ity." Intl. J. Theor. Phys. 25 (1986):907.
36. Guckenheimer, J. "Sensitive Dependence to Initial Conditions for One-Di-
mensional Maps." Comm. Math. Phys. 70 (1979):133.
37. Guckenheimer, J., and P. Holmes. Nonlinear Oscillations, Dynamical Sys-
tems, and Bifurcations of Vector Fields. New York: Springer-Verlag, 1983.
38. Hoperoft, J. E., and J. D. Ullman. Introduction to Automata Theory, Lan-
guages, and Computation. Reading, MA: Addison-Wesley, 1979.
39. Huang, K. Statistical Mechanics. New York: J. Wiley and Sons, 1963.
40. Jacobson, M. V. "Absolutely Continuous Invariant Measures for One-Par-
ameter Families of One-Dimensional Maps." Comm. Math. Phys. 81
(198439.
41. Jaynes, E. T. "Where do We Stand on Maximum Entropy?" In Essays on
Probability, Statistics, and Statistical Physics, edited by E. T. Jaynes. Lon-
don: Reidel, 1983, 210.
42. Kemeny, J. G. "The Use of Simplicity in Induction." Phil. Rev. 62 (1953):391.
44. Kolmogorov, A. N. "Stationary Sequences in Hilbert Space." In Linear Least-

Squares Estimation, edited by T. Kailath. New York: Dowden, Hutchinson,
and Ross, 1977, 66.
45. Kolmogorov, A. N. "A New Metric Invariant of Transient Dynamical Sys-
tems and Automorphisms in Lebesgue Spaces." Dokl. Akad. Nauk. SSSR 119
(1958):61 [(Russian) Math. Rev. 21, no. 2035a}.
46. Kolmogorov, A. N. "Entropy per Unit Time as a Metric Invariant of Auto-
morphisms." Dokl. Akad. Nauk. SSSR 124 (1959):754 [(Russian) Math. Rev.
21, no. 2035b}.
47. Kolmogorov, A. N. "Three Approaches to the Concept of the Amount of In-
formation." Prob. Info. Trans. 1 (1965):1.
48. Kolmogorov, A. N. "Combinatorial Foundations of Information Theory and
the Calculus of Probabilities." Russ. Math. Surveys 38 (1983):29.
49. Lindenmayer, A., and P. Prusinkiewicz. "Developmental Models of Multicel-
lular Organisms: A Computer Graphics Perspective." In Artificial Life, edited
by C. G. Langton. Santa Fe Institutes Studies in the Sciences of Complexity,
vol. VI. Redwood City, CA: Addison-Wesley, 1989, 221.
50. Martin-L6f, P. "The Definition of Random Sequences." Info. Control 9
(1966):602.
51. Metropolis, N., M. L. Stein, and P. R. Stein. "On Finite Limit Sets for Trans-
formations on the Interval." J. Combin. Theory 15A (1973):25.
52. Minor, J., and W. Thurston. "On Iterated Maps of the Interval." Princeton
University preprint, 1977.
53. Minsky, M. Computation: Finite and Infinite Machines. Englewood Cliffs, NJ:
Prentice-Hall, 1967.
54. Misiurewicz, M. "Absolutely Continuous Measures for Certain Maps of an
Interval." Preprint IHES M-79-293, 1979.
55. Myrberg, P. J. "Iteration der Reelen Polynome Zweiten Grades III." Ann.
Akad. Sc. Fennicae A, I 336/3 (1963).
56. Ornstein, D. S. "Ergodic Theory, Randomness, and Chaos." Science 243
(1989):182.
57. Packard, N. H., J. P. Crutchfield, J. D. Farmer, and R. S. Shaw. "Geometry
from a Time Series." Phys. Rev. Lett. 45 (1980):712.
58. Packard, N. H. "Measurements of Chaos in the Presence of Noise." Ph.D.
thesis, University of California, Santa Cruz, 1982.
59. Lloyd, S., and H. Pagels. "Complexity as Thermodynamic Depth." Ann.
Phys. 188 (1988):186.
60. Renyi, A. "On the Dimension and Entropy of Probability Distributions."
Acta Math. Hung. 10 (1959):193.
61. Rissanen, J. "Modeling by Shortest Data Description." Automatica 14
(1978):462.
62. Rissanen, J. "Universal Coding, Information, Prediction, and Estimation."
IEEE Trans. Info. Theory IT-30 (1984):629.
tion. Champaign-Urbana: University of Illinois Press, 1962.
64. Shaw, R. "Strange Attractors, Chaotic Behavior, and Information Flow." Z.

Naturforsh. 36a (1981):80.
65. Shaw, R. The Dripping Faucet as a Model Chaotic System. Santa Cruz, CA:
Aerial Press, 1984.
66. Sinai, Ja. G. "On the Notion of Entropy of a Dynamical System." Dokl. Akad.
Nauk. SSSR 124 (1958):768.
67. Solomonoff, R. J. "A Formal Theory of Inductive Control." Info. Control 7
(1964):224.
68. Temperley, H. N. V. Graph Theory and Applications. New York: Halsted
Press, 1981.
69. Wold, H. 0. A. A Study in the Analysis of Stationary Times Series. Stock-
holm: Almqvist and Wiksell Forlag, 1954.
70. Wolfram, S. "Computation Theory of Cellular Automata." Comm. Math.
Phys. 96 (1984):15.
71. Wolfram, S. Theory and Applications of Cellular Automata. Singapore: World
Scientific Publishers, 1986.
72. Wolfram, S. "Intractability and Undecidability in Theoretical Physics." In
Theory and Applications of Cellular Automata, edited by S. Wolfram. Singa-
pore: World Scientific,1986.
73. Zurek, W. H. "Thermodynamic Cost of Computation, Algorithmic Complex-
ity, and the Information Metric." Nature 341 (1989):113-124.
IV Physics of Computation
Norman Margolus
MIT Laboratory for Computer Science, Cambridge, MA 02139
Parallel Quantum Computation
Results of Feynman and others have shown that the quantum formalism
permits a closed, microscopic, and locally interacting system to perform
deterministic serial computation. In this paper we show that this formal-
ism can also describe deterministic parallel computation. Achieving full
parallelism in more than one dimension remains an open problem.
INTRODUCTION
In order to address questions about quantum limits on computation, and the possi-
bility of interpreting microscopic physical processes in informational terms, it would
be useful to have a model which acts as a bridge between microscopic physics and
computer science.
Feynman and others2,6,1° have provided models in which closed, locally inter-
acting microscopic systems described in terms of the quantum formalism perform
deterministic computations. Up until now, however, all such models implemented

274 Norman Margolus
deterministic serial computation, i.e., only one part of the deterministic system is
active at a time.
We have the prejudice that things happen everywhere in the world at once,
and not sequentially like the raster scan which sweeps out a television picture.
It would be surprising, and perhaps a serious blow to attempts to ascribe some
deep significance to information in physics, if it were impossible to describe parallel
computations within the quantum formalism.
In this paper, we extend the discussion of a previous paperl° to obtain for
the first time a satisfactory model of parallel "quantum" computation, but only
in one dimension. The two-dimensional system discussed in the previous paperl°
is also shown to be a satisfactory model, but the technique used here only allows
one dimension to operate in parallel: the more general problem of the possibility
of fully parallel two- or three-dimensional quantum computation remains open.
COMPUTATION
The word computation is used in many contexts. Adding up a list of numbers is a
kind of computation, but this task requires only an adding machine, not a general
purpose computer. Similarly, we can compute the characteristics of airflow past an
aircraft's wing by using a wind tunnel, but such a machine is no good for adding
up a list of numbers.
An adding machine and a wind tunnel are both examples of computing ma-
chines: machines whose real purpose is not to move paper or air, but to manipulate
information in a controlled manner. It is the rules that transform the information
that are important: whether the adding machine uses enormous gears and springs,
or microscopic electronic circuits, as long as it follows the addition algorithm cor-
rectly, it is acting as an adding machine.
A universal computer is the king of computing machines: it can simulate the
information transformation rules of any physical mechanism for which these rules
are known. In particular, it can simulate the operation of any other universal
computer—thus all universal computers are equivalent in their simulation capabil-
ities. It is an unproven, but thus far uncontradicted contention of computer theory
that no mechanism is any more universal than a universal digital computer, i.e.,
one that manipulates information in a discrete form.
Assuming a finite universe, no machine can have a truly unbounded memory;
what we mean when we talk about a general purpose computer is a machine that,
if it could be given an unbounded amount of memory, would be a universal com-
puter. (In common usage, the terms general purpose computer and computer are
synonymous.) Similarly, when we talk about a finite set of logic elements as being
universal, we mean that an unbounded collection of such elements could constitute
a universal computer.
Parallel Quantum Computation 275
An adding machine is not a general purpose computer: a certain minimum

level of complexity is required before universal behavior is possible. This complex-
ity threshold is quite low: aside from memory, a few dozen logical NAND gates,
suitably connected, can be a computer. On the other hand, some modern computers
contain millions of logic elements in their central processors: this doesn't let these
computers solve any problems that the humblest microcomputer couldn't solve; it
simply lets them run faster. Except for speed and memory capacity, there is no
difference in computational capability between a Cray-XMP and an IBM-PC.
QUANTUM COMPUTATION
Although all general purpose computers can perform the same computations, some
of them work faster, use less energy, weigh less, are quieter, etc., than others. In
general, some make better use of the computational opportunities and resources
offered by the laws of physics than do others. For example, since signals travel so
slowly (it takes about a nanosecond to go a foot, at the speed of light), there is a
tremendous speed advantage in building computers which have short signal paths.
Modern microprocessors have features that are only a few hundred atoms across:
such small components can be crowded close together, allowing the processor to be
small, light, and fast.
As we try to map our computations more and more efficiently onto the laws
and resources offered by nature, we are eventually confronted with the question
of whether or not we can arrange for extremely microscopic physical systems to
perform computations. What we ask here is in a sense the opposite of the hidden
variables question: we ask not whether a classical system can simulate a quantum
system in a microscopic and local manner, but rather, whether a quantum system
can simulate a classical system in such a manner.
All of our discussion of quantum computation will be based on autonomous
systems: we prepare the initial state, let the system undergo a Schr6dinger evolution
as an isolated system, and after some amount of time we examine the result.41
Since the Schr6dinger evolution is unitary, and hence invertible, we must base our
computations on reversible logic.'
111For some types of computations, we can't set a very good limit on how long we should let it
run before looking. In such cases, we would simply start a new computation if we look and find
that we aren't finished.
276 Norman Margolus
FIGURE 1 Fredkin gate. The top control

input goes through unchanged (A' = A), .
and the bottom inputs either go straight
through also (rf A = 1) or cross (if A = 0,
then B' = C and C' = B).
REVERSIBLE COMPUTATION
Until recently, it was thought that computation is necessarily irreversible: it was
hard, for instance, to imagine a useful computer in which one could not simply erase
the contents of a register. It was to most people a rather surprising result3,7,8,9,13
that computers can be constructed completely out of invertible logic elements,
and that such machines can be about as easy to use as conventional computers.
This result has thermodynamic consequences, since it turns out that a reversible
computer is the most energy efficient engine for transforming information from
one form to another. This result also means that computation is not necessarily a
(statistically irreversible) macroscopic process.
As an example of an invertible logic element, consider the Fredkin gate of
Figure 1. This gate is in fact its own inverse (two connected in series give the identity
function), and this gate is a universal logic element: you can construct any invert-
ible logic function out of Fredkin gates. A logic circuit made out of Fredkin gates
looks much like any conventional logic circuit, except that special "mirror-image
circuit" techniques are used to avoid the accumulation of undesired intermediate
results that we aren't allowed to simply erase (see Fredkin and Toffoli7 for more
details).
Feynman made a quantum system simulate a collection of invertible logic gates
connected together in a combinational circuit (i.e., one without any feedback).12]
In Feynman's construction, only one logic element was active (i.e., transforming its
inputs into outputs) at any given time: the different gates were activated one at a
time as they were needed to act on the output of gates that were active earlier. We
can imagine a sort of "fuse" running through our circuit: as the active part of the
fuse passes each circuit element in turn, it activates that element. Using a collection
of two-state systems (which he called atoms) to represent bits, Feynman made a
"quantum" version of this model. In what follows, we will think of our two-state
systems as spin-1/2 particles.
MAlthough combinational circuitry can perform any desired logical function, computers are usu-
ally constructed to run in a cycle, reusing the same circuitry over and over again. The parallel
models discussed later in this paper run in a cycle.
FEYNMAN'S QUANTUM COMPUTER

In 1985, Richard Feynman6 presented a model of computation which was quantum
mechanically plausible: there seems to be no fundamental reason why a system like
the one he described couldn't be built (31 In his idealization, he managed to arrange
for all of the quantum uncertainty in his computation to be concentrated in the
time taken for the computation to be completed, rather than in the correctness
of the answer. Thus if his system is examined and a certain bit (state of a spin)
indicates that the computation is done, then the answer contained elsewhere in the
system is always correct. What's more, he managed to make his computation run
at a constant rate.
His system consists of two parts: a collection of reversible logic gates, each
made up of several interacting spins, and a chain of "clock" spins which passes next
to each gate in turn. Note that we will think of each wire that runs between two
gates as being a very simple reversible gate: one that exchanges the values of the
spins at its two ends. In this way we are able to write down a unitary operator Fk
that describes the desired behavior of the kth gate: for a given invertible gate such
as the Fredkin gate or a wire, we can write this operator down explicitly in terms
of raising and lowering operators. For example, for a wire Fi joining spin a of gate
m and spin b of gate n, the rule is
Fc = a,„4 + al„b„, + a,,,al„b„bt + aln ambIbn
where a and b are lowering operators at the two spins, and at and bt are their
Hermitian adjoints, which are raising operators on the two spins.
Without any claim yet to a connection with quantum mechanics, we can cast
the overall logical function implemented by an N-gate invertible combinational logic
function into the language of linear operators acting on a tensor product space as
follows:
F= E FkCkCI +1 (1)
k=1
where ck is the lowering operator on the clock spin that passes next to the kth gate
Fk. If we start all of the clock spins off in the down state except for the spin next
to the first gate, then if F acts on this system, only the term
F1c14
will be nonvanishing. This term will cause the spins acted upon by the first gate to
be updated, the first clock spin will be turned down, and the second clock spin will
go up. Similarly, if F acts again, the second gate will update, and the up spin will
move to the third position. Clearly if the initial state has only a single clock spin
I31Less physical models were proposed earlier by Benioir,1 who seems to have been the first to
raise the question of quantum computation in print.
278 Norman Margolus
up, F will preserve that property. Using the position of the up clock spin to label
the state, then if 11) is the initial state, F 11) = 12), and in general F 1k) = lk + 1).
We have thus been able to write the forward time-step operator as a sum of local
pieces by serializing the computation—only one gate in the circuit is active during
any given step.
Notice that the operator n is the inverse of Fk, since the role of raising and
lowering operators is interchanged. Similarly, Ft is the inverse of F, since each term
of the former undoes the action of the corresponding term of the latter, including
moving the clock spin back one position. Now if we add together the forward and
backward operators, we get an Hermitian operator H = F + Ft which is the sum
of local pieces, each piece acting only on a small collection of neighboring spins
(a gate). At this point we make contact with quantum mechanics, by seeing what
happens if we use this H as the Hamiltonian in a Schrodinger evolution.
If we expand the time evolution operator U(t) we get
tz
U(2) = 1 _ _ Hz (F+ Ftriz
2 -4- = 1 — i(F Ft)i 2 +
and so we get a sum of terms, each of which is proportional to F or Ft to some
power. Thus if 1k) is evolved for a time t, it becomes lk) which is a superposi-
tion of configurations of the serialized computation which are legitimate successors
and predecessors of 1k): each term in the superposition has a single clock spin at
some position, and the computation is in the corresponding state.
Feynman now noted that the operators Fk don't affect the dynamics of the ck's:
we can consider F = EkN ckci÷i for the purposes of analyzing the evolution of the
clock spins. But then H = F + Ft supports superpositions of the one-spin-up states
called spin waves, as is well known. When we add back in the Fk's, the computation
simply rides along at a uniform rate on top of the clock spin waves. This point will
be discussed in more detail below, when we extend this serial model to deal with
parallel computation.
PARALLEL COMPUTATION
Serial computers follow an algorithm step by step, completing one step before be-
ginning the next; parallel computers make it possible to do several parts of the
problem at once in order to finish a computation sooner. Although Feynman's con-
struction is based on a serial model, his idea of concentrating all of the quantum
uncertainty into the time of completion, while leaving none in the correctness of the
computation, can be extended to parallel computations.1° Maintaining correctness
is again achieved simply by construction of the Hamiltonian: states in the Hilbert
space that correspond to configurations on a given computational orbit form an
invariant subspace under the Schrodinger evolution. This property of the Hamilto-
nian does not, in general, say anything about the rate at which we can compute.
Here we show that Feynman's technique for making a serial model of quantum com-
putation run at a constant rate can, in fact, also be extended to apply to a parallel
system, in particular to the one-dimensional analogue of the case considered in the
previous paper.1° From this, we can derive a way of making the two-dimensional
system considered in the previous paperl° compute at a constant rate, but with
parallelism that extends over only one dimension.
For simplicity, our discussion of parallel computers will be confined to cellular
automata (CA): uniform arrays of computing elements, each connected only to its
neighbors. These systems can be universal in the strong sense that a given universal
cellular automaton (assuming it's big enough) can simulate any other computing
structure of the same dimensionality at a rate that is independent of the size of
the structure .141 By showing that, given any desired (synchronous) CA evolution,
we can write down a Hamiltonian that simulates it, we will have shown that the
QM formalism is computationally universal in this strong sense, at least for one-
dimensional rules.
Feynman's model involved only states in which a single site was active at a time.
In order to accommodate both neighbor interactions and parallelism in quantum
mechanics, we find that we are forced to consider asynchronous (no global time)
computing schemes (but still employing invertible logic elements). For suppose that
our Hamiltonian is a sum of pieces each of which only involves neighbor interactions
H = E H x,y,z (2)
x,y,z
Then consider the time evolution 1— iHt over an infinitesimal time interval. When
this operator acts on a configuration state of our system, we get a superposition of
configuration states: one term in the superposition for every term in the sum (Eq. 2)
above. If we want all of the terms in this superposition to be valid computational
states, then we must allow configurations in which one part has been updated, while
everything else has been left unchanged.
LOCAL SYNCHRONIZATION
One can perform an effectively synchronous computation using an asynchronous
mechanism by adding extra state variables to keep track of relative synchronization
(how many more times one portion of the system has been updated than an adjacent
portion). To use an analogy, consider a bucket brigade carrying a pile of stones up
a hill. You hand a stone to the first person in line, who passes it on to the next,
and so on up the hill. An asynchronous computation would correspond to every
individual watching the person ahead of himself, and passing his stone along when
the next person has gotten rid of theirs. This involves only local synchronization. A
[41This isn't the usual definition of universality in CA, but it is the one that we'll use here.
280 Norman Margoius
synchronous computation would correspond to having everyone pass on their stones

whenever they hear the loud tick of a central clock. Notice that both schemes get
exactly the same sequence of stones up the hill; only the timing of when a given
stone moves from hand to hand changes.
Now let us consider a one-dimensional cellular automaton. We imagine a row
of cells, each containing a few bits of state. Our evolution will consist of two phases:
first, we group the cell at each even-numbered position with the cell to its right, and
perform a logical transformation on the state of these two cells; then we regroup the
cells so that each even-numbered cell is associated with the cell to its left, and again
we update the pair. We alternate these two kinds of steps to produce a dynamics.
Notice that if the transformation we perform on each pair of cells is an invertible
logic function, then the overall dynamics will be invertible.
Since cells are updated in pairs, it is really unnecessary for the entire system to
be globally synchronous: we can achieve effectively the same result by local means.
Imagine that we take our configuration of cells, and to each cell we add an extra
number, which is the number of times that cell has been updated. In a synchronous
updating scheme, all cells would start out with this number set to zero, and this
number would increment uniformly throughout the system: if one cell is at step
27, all cells are. But suppose we start out with the same initial data, and only
update one pair (with the appropriate grouping for an even-numbered step). Since
the result of this updating only depends on the contents of these two cells, it makes
no difference whether or not any other cells have been updated yet. Next, we could
update some more pairs. Now suppose two adjacent pairs have been updated: we
have four consecutive cells that correspond to the synchronous time step number
one, and are labeled as having been updated once. The middle two cells of these four
are a correct group for an odd-numbered synchronous step, and so we can update
this odd pair and label them as having been updated twice. Each of these two cells
is now ready to be updated again as part of even pairs, as soon as the adjacent cells
catch up! Thus we can perform an asynchronous updating of pairs, using the count
of updates for each cell to tell us when adjacent cells can be updated as a pair. As
long as we observe this protocol, we can update cells in any order and retain the
property that any cell that is labeled as having been updated n times is at the same
state that it would have had if the whole system had been updated synchronously
n times.
•• - 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 - •
FIGURE 2 A section of a one-dimensional pairing automaton showing only the states

of the dock bits in each cell. The solid bars bracket the pairing used for even times,
the dotted for odd times.
t I
1 1
0 0
1 1 1
o o o 0 0
MEM
1 .1 x
o o
FIGURE 3 A spacetime diagram showing relative times of adjacent clock spins

corresponding to the data in the previous figure. Pairing of cells is indicated as before.
Notice that with this scheme, two adjacent cells cannot get more than one step
apart in update-count: since this count is only used to tell whether a given cell is
using the even step pairing or the odd step pairing, and to tell if adjacent cells are
at the same step, we only need to look at the least significant bit of the up date-
count. Thus if we take our original synchronous automaton and add a single bit of
update-count to each cell, we can run the system asynchronously while retaining a
perfectly synchronous causality.
In Figure 2 we show a possible state for the update-count bits (henceforth we'll
call them clock bits) in a one-dimensional pairing automaton of the type we've been
discussing, which is consistent with an evolution starting from a synchronous initial
state. In Figure 3 we use a spacetime diagram to integrate the relative time phases:
arbitrarily calling the time at the left hand position t = 0, we mark cells using the
relative time information encoded in the clock bits. As we move across, if a cell is
at the same time as its neighbor to the left, we mark it at the same time on this
diagram, if it is ahead, we mark it one position ahead, etc. The result is a diagram
illustrating the hills and valleys of time present in this configuration. Note that we
can tell if a given cell in Figure 2 which is at a different time phase than its neighbor
to the left is ahead or behind this neighbor by seeing whether or not it is waiting
for the neighbor to catch up in order to be paired with it.
Note that if we allow backward steps, this synchronization scheme still works
fine: we can imagine that a backward step is simply undoing a forward step, get-
ting us to a configuration we could have gotten to by starting at an earlier initial
synchronous step, and running forward.
These configurations then, with their hills and valleys of time, will be the
classical configurations which our quantum system will simulate.
282 Norman Margolus
A "QUANTUM" PARALLEL AUTOMATON

Again we imagine a collection of interacting spins as our computational system.
Let In, a) be a state on our locally synchronized computational trajectory, where
n refers to time and a refers to other information needed to uniquely specify a
configuration. Since our configurations have no global moment of time, we use an
integrated notion of time: we simply add up the equivalent synchronous times for
all cells in the automaton, and divide by the number of cells in a single block. With
this normalization, if we have a configuration at integrated time n and we take a
step forward at a single block, then the resulting configuration will be at time n +1.
We imagine that our system has two kinds of spins at each site in our one-
dimensional chain of cells: data spins and clock spins. We'll let Di be our rule for
updating the block of data spins belonging to two adjacent cells at locations i and
i + 1; DI is the inverse rule. We imagine that we have a single spin-1/2 clock spin
at each cell, and that ci is the lowering operator acting on the spin at cell i. Now
we can define F, our forward time-step operator:
F= E Di44+, + E =E (3)
i even i odd
This operator, acting on a state In, a), produces a superposition of states each
of which belongs to time n + 1. Similarly, Ft takes us backwards one time step.
Note, however, that Ft is not the inverse of F. Nevertheless, on the subspace of
computational configurations (those that can be obtained by a sequence of local
updates starting from a synchronous configuration) F and Ft commute: this prop-
erty, which will be proven below, will be crucial in our construction.
As before, we let H = F + Ft, and if we expand the time evolution operator
U(t) = e—sHt we get a superposition of terms, each involving products of Fi's
and /I's for various i's and j's. Since each such term, acting on a computational
configuration, gives us another computational configuration (by construction of the
clock bits), the time evolution U doesn't take us out of our computational subspace.
RUNNING IN PARALLEL
Now we would like to have our parallel computation run forward at a uniform
rate. We are imagining that our space is periodic: the chain of cells is finite and
the ends are joined. Designating one particular state of the equivalent globally
synchronous computation as t = 0, we can assign a value of t to every configuration
on each synchronous computational orbit, and from these assign a value of n to the
integrated time on every locally synchronized computational configuration. Thus
we can construct an operator N which, acting on a configuration In, a), returns n:
N In, a) = n In, a)
From this we can construct a computational velocity operator V:

[N' H] — [N, F] + [N, Ft]
V= .
But NF In, a) = (n + 1)F In, a), since F takes In, a) into a superposition of states
all of which correspond to time n + 1, and so
[N, F] In, a) = (n + 1)F In, a) — nF In, a) = F In, a)
and similarly, [N, Ft] In, a) = —Ft In, a). Thus on this subspace,
F — Ft
V=
Now for the average computational velocity (V) = d (N) I dt to be constant, we
would like V to commute with H. So the question becomes, does V commute with
H? Now [V, H] = [(F — Ft)/i, F + Ft] = 2[F, Ft]l i and so this is the same as the
question, does F commute with Ft?
Each term in the product F Ft involves one Fj and one F. Clearly if I j — kl >
2, then [F5 , F2] = 0, since the two operators act on disjoint sets of spins. If I j — k I =
1, then the product Fin vanishes when applied to a computational state, since
either F5 or F vanishes: either the pair of cells at k and k + 1 are not ready to
take a step backwards (and so Fl vanishes), or if they are ready to go back and Fit
acts on them, then in the resulting configuration these two cells are only ready to
take a step forward if they are paired together again, and so F5 vanishes. Thus the
commutator of F and Ft can be written
[F, Ft] =E[Fk, = EFkFit-E FIFk
which, when applied to a computational configuration, just gives the difference
between the number of blocks that are ready to go backwards, and the number
that are ready to go forwards. Now the question of commutation is reduced to a
question about the computational configurations: "Is it true that the number of
blocks ready to go forward is always equal to the number ready to go back?"
For the two-dimensional case considered in the previous paper," the answer is
no, but for a one-dimensional automaton with periodic boundaries, the answer is
yes: in a flat (globally synchronous) configuration, the answer is clearly yes, and it
is easy to check that any sequence of updates preserves this property.N
Now we can make our cellular automaton run at a uniform rate: we use as
our initial state a superposition of eigenstates of V which has a fairly narrow AN,
so that the integrated time in our computation is fairly definite.(61 Since (V) is
constant, this state will evolve at a uniform rate, as desired.
N Equivalently, one can simply observe that between every two blocks that are ready to go forward,
there is always a block that is ready to go back, and vice versa, and so in a periodic configuration
the number ready to go forward is always equal to the number ready to go back.
NThis also avoids the necessity of performing the whole computation ahead of time in order to
construct the initial superposition: we simply truncate the small-amplitude long-time terms of our
initial superposition, effectively adding a small error term to our state whose amplitude doesn't
grow with time.16
284 Norman Margolus
RELATING THE MODELS
It turns out that the one-dimensional version of Feynman's serial model is a special
case of the model discussed above: if we complement the meaning of every second
clock spin (say, all the ones at even positions), Eq. 3 becomes
F = E Acid+, + E E
i even i odd
which is of exactly the same form as Eq. 1. An initial state containing a single up
clock spin and all the rest down would correspond, in our parallel system of Eq. 3,
to all of the even clock spins up, and all of the odd ones down, except for the spin
at the active position k, which is the same as its two neighbors. Since updating
in our parallel model only occurs at positions where two adjacent clock spins are
the same, there are only two active blocks in such an alternating configuration:
the block involving k and k + 1, which will be a step forward if updated, and the
block involving k and k — 1, which will be a step back if updated. If we draw a
spacetime diagram of the clock spins around position k (see Figure 4) showing the
relative synchronization implied by the alternating pattern of clock spins, we see
that it forms a staircase with a landing that moves up and down in time as its
leading edge or trailing edge is updated. Because the space is periodic, the top of
this staircase is connected to the bottom: this configuration is not on the orbit of
any synchronous parallel computation.
t I
0
1
0
1
0
1
0 0 0
1
0
1
0
1
FIGURE 4 A spacetime diagram of the active region of a parallel one-dimensional

cellular automaton with a staircase configuration.
FIXING THE 2-D MODEL

In the previous paper"' I gave a two-dimensional analog of the parallel model
discussed here, using a particular universal reversible cellular automaton which
was updated using a 2-D version of the locally synchronized block partitioning
discussed above. There I was unable to make the model run at a uniform rate; the
parallel technique used above can in fact be extended to make this earlier model run,
but with only one dimension of parallelism. The idea is to sweep a one-dimensional
parallel active region across the two-dimensional system using staircase and landing
configurations analogous to what we saw in the previous section: we initialize the
rows of our 2-D clock spins (they were called guard bits in the previous paper")
with an alternating pattern of horizontal stripes (all of the even rows up, the odd
rows down) except for a single row (the active region) that is the same as its
two neighboring rows. Then every column contains exactly one segment with three
consecutive clock spins that are the same, and in fact each column of clock spins,
when represented on a spacetime diagram, looks exactly like Figure 4. It is easy
to verify that this property is preserved by the dynamics, and that the dynamics
of the active region is isomorphic with that of our 1-D parallel model. Thus if we
make a wave packet state out of configurations on the same computational orbit as
a staircase with a landing, we can make this wave packet run repeatedly across our
system, doing a line of updates in parallel as it travels up the staircase.
Note that the CA model of Margolusl° has the property that one can perform
any desired computation by constructing patterns of up and down values in the
data spins that resemble conventional computer circuitry: gates, signals, wires, etc.
In such patterns, no signals need ever go outside a fixed-sized region. Thus the fact
that a staircase configuration is not on the computational orbit of any synchronous
computation doesn't mean that such a configuration can't perform an equivalent
computation: the arrangement of clock spins outside of the fixed-sized region con-
taining the circuit of interest is irrelevant as long as computation is able to proceed
within this region, and as long as relative synchronization is never locally violated.
Of course what we would really like is to have a fully parallel 2-D system, but
at least we now have shown that we can have parallelism in a computationally
universal quantum Hamiltonian system with only neighbor interactions.
CONCLUSIONS
The study of the fundamental physical limits of efficient computation requires us
to consider models in which the mapping between the computational and physical
degrees of freedom is as close as is possible. This has led us to ask whether the
structure of quantum mechanics is compatible with parallel deterministic compu-
tation. If the answer was no, then such computation would in general have to be a
macroscopic phenomenon. In fact, at least in one dimension, it does seem possible
286 Norman Margolus
to construct plausible models to simulate any locally interacting deterministic sys-

tem at a constant rate and in a local manner. The problem of finding satisfactory
models of fully parallel quantum computation in more than one dimension remains
open.
Physically motivated models of computation such as those considered here, in
which individual degrees of freedom have both a. computational and a physical in-
terpretation, act as bridges between theoretical physics and theoretical computer
science. Computers constructed (for efficiency) with a. physics-like structure may be
usefully analyzed using concepts and techniques imported from physics11; compu-
tational reinterpretations of such imported physical concepts may someday prove
useful in the study of physics itself.
ACKNOWLEDGMENTS
I would like to gratefully acknowledge conversations with R. P. Feynman in which
he pointed out to me the relationship between my parallel model and his serial
model, and discussions with L. M. Biafore in which it became evident that the
one-dimensional version of my parallel QM construction might be made to run at
a uniform rate.
This research was supported by the Defense Advanced Research Projects Agen-
cy and by the National Science Foundation.
REFERENCES
1. Benioff, P. A. "Quantum Mechanical Hamiltonian Models of Discrete Pro-
cesses that Erase Their Own Histories: Application to Turing Machines." Int.
J. Theor. Physics 21 (1982):177-202.
2. Benioff, P. A. "Quantum Mechanical Hamiltonian Models of Computers." In
the proceedings of the conference "New Ideas and Techniques on Quantum
Measurement Theory," Jan. 1986. Ann. New York Acad. Sci. 480 (1986):475-
486.
3. Bennett, C. H. "Logical Reversibility of Computation." IBM J. Res. 6 Devel.
17 (1973):525.
4. Deutsch, D. "Quantum Theory, the Church-Turing Hypothesis, and Universal
Quantum Computers." Proc. Roy. Soc. Lond. A 400 (1985):97-117.
5. Feynman, R. P. "Simulating Physics with Computers." Int. J. Theor. Phys.
21 (1982):467.
6. Feynman, R. P. "Quantum Mechanical Computers." Opt. News 11 (1985).
7. Fredkin, E., and T. Toffoli. "Conservative Logic." Int. J. Theor. Phys. 21
(1982):219.
8. Landauer, R. "Irreversibility and Heat Generation in the Computing Pro-
cess." IBM J. Res. & Devel. 5 (1961):183.
9. Margolus, N. "Physics-Like Models of Computation." Physica 10D (1984):81.
10. Margolus, N. "Quantum Computation." In the proceedings of a conference
"New Ideas and Techniques on Quantum Measurement Theory," Jan. 1986.
Ann. New York Acad. Sci. 480 (1986):487-497.
11. Margolus, N. "Physics and Computation." Ph.D. Thesis, Tech. Rep.
MIT/LCS/TR-415, MIT Laboratory for Computer Science, 1988.
12. Peres, A. "Reversible Logic and Quantum Computers." Phys. Rev. A 32
(1985):3266-3276.
13. Toffoli, T. "Computation and Construction Universality of Reversible Cellu-
lar Automata." J. Comp. Sys. Sci. 15 (1977):213.
14. Toffoli, T. "Cellular Automata as an Alternative to (Rather than an Ap-
proximation of) Differential Equations in Modeling Physics." Physica 10D
(1984):117.
15. Toffoli, T., and N. Margolus. Cellular Automata Machines: A New Environ-
ment for Modeling. Cambridge: MIT Press, 1987.
16. Zurek, W. H. "Reversibility and Stability of Information Processing Sys-
tems." Phys. Rev. Lett. 53 (1984):391.
W. G. Teich and G. Mahler
Institut far Theoretische Physik, Universitat Stuttgart, Pfaffenwaldring 57, 7000 Stuttgart 80
FRG
Information Processing at the Molecular

Level: Possible Realizations and Physical
Constraints
1. INTRODUCTION
In the last couple of decades an enormous progress has been achieved in the minia-
turization of hardware elements for computing devices based on conventional mi-
croelectronics. The 4-megabit chip is employed commercially meanwhile and the
dimension of individual elements has already reached the submicrometer regime.
But for a length scale of the order of the de Broglie wavelength (typically a few
nanometers), quantization effects become important. The miniaturization process,
which is limited so far by a lacking mastering of the technology to build the re-
spective structures (mask production, etching, etc.), is bounded by fundamental
physical constraints. In order to further reduce the size of the hardware elements
and to increase integration, new physical concepts of information processing must
be developed. Investigations in this direction can be summarized as "molecular
electronics."2 It is concerned with information processing systems where the ba-
sic elements have a dimension of a few nanometers and, therefore, possess typical
molecular properties like a discrete energy subspectrum. It is not limited to organic
macromolecules as a new class of substances, but includes possible realizations in
form of semiconductor heterostructures ("quantum-dots"") as well.

290 W. G. Teich and G. Mahler
Since it is difficult, if not impossible, to realize far reaching interconnections

between various subunits within a highly integrated system,4 molecular electronics
favors a parallel architecture in the form of an array of locally interconnected cells.13
This approach is somewhat complementary to the recent interest in novel computer
architectures (hypercube architectures, neural networks, etc.) which is motivated
by the limitations of the conventional von Neumann architecture. In this context
cellular automata16 (CA) might be considered as idealized models for a parallel
computer architecture, similar to how the Turing machine is an idealized model for
a sequential architecture.
Contrary to "conventional" electronic devices, which can be discussed in terms
of classical transport equations, a molecular electronic system must be described
by quantum mechanics ("quantum computer"). Thus, the fundamental question of
computation in the regime of quantum mechanics is addressed. Contrary to some
more general quantum models of computation,3,5,12,17 our starting point is realistic
physical materials and interactions. In this way it is possible to discuss in detail
the system and its limitations. Also experimentally testable consequences might be
found.
2. PHYSICS AND FUNCTION

Information processing systems can be defined by the task that they are expected
to perform: a computer stores and manipulates information.? To accomplish this
function, the system must possess some fundamental (information theoretic) prop-
erties. On the other hand, any computer is a physical systems and must obey the
laws of physics. Therefore, we can formulate physical properties which are necessary
for any physical system in order to be able to store and process information.
The logical basis of information processing is the alphabet. It consists of a finite
set of distinguishable signs and is needed to represent information. The correspond-
ing physical property is multistability: on the relevant time scale the system must
possess several stable states (attractors). The coding of information is achieved
by a one-to-one mapping of the attractors to the symbols of the alphabet. Thus,
the number of attractors already limits the choice of the alphabet. In conventional
microelectronic devices, multistability is achieved by various current or voltage dis-
tributions. On the molecular level, on the other hand, stationary states can be
used to realize multistability. Examples are distinct structural configurations of a
molecule or localized electronic states in molecules or in solid-state devices (lattice
defects, impurities, quantum-dots) as they are used within persistent spectral hole
burning.8
A computer must be able to communicate with its surroundings (input and
output). Physically this corresponds to the preparation and measurement of the
attractors. This implies an interaction between the system and its environment,
at least during the preparation or the measurement stage. A reliable preparation
Information Processing at the Molecular Level 291
and measurement is an indispensable prerequisite in order to use an attractor to

code information. On the molecular level, this raises questions regarding quantum
fluctuations and the quantum mechanical nature of the measurement process (e.g.,
enhancement of quantum signals).
For the storage of information, multistability and a reliable preparation and
measurement of the attractors are sufficient. However, for the processing of infor-
mation, further requirements can be specified. To process information, the system
must divide into two or more subsystems with an independent input and output
(modularity), but which exchange information during the data processing. This
requires a minimal amount of control of the system. Control means to switch selec-
tively from one attractor to another. The necessary amount of control is given by
the requirements of an independent preparation and measurement of the various
subunits and the realization of simple logical operations or transition rules.
The required amount of control and thus the modularity of the logical system
is easiest achieved by a modularity of the physical system. Prerequisite for that is a
hierarchy of interaction energies which allows the total quantum system to decom-
pose into various subsystems which can be prepared and measured independently,
but still interact strongly enough in order to exchange the necessary amount of in-
formation. At this point the role of dissipation should be emphasized. Two quantum
systems which have interacted in the past (which is necessary to process informa-
tion) and evolve coherently in time cannot be separated again.1 In order to assure
the independent preparation and measurement of the subsystems, it is necessary to
include dissipation which destroys the coherence between the two subsystems.
The basis for a complex dynamics, which meets these requirements, is the
structural complexity of the system. Neither a perfect crystal (complete order), nor
an ideal gas (complete randomness) can be used to store and process information.
What is necessary is a hierarchical structure of the system.15 The different length
scales of this structure define various time and energy scales, which, on the other
hand, allow the approximate decomposition into independent subsystems and thus
the modular setup of the system.
3. OPTICALLY CONTROLLED MULTISTABLE QUANTUM

SYSTEM
The specific model that we consider is an optically controlled multistable quantum
system, i.e., a quantum system which is coupled to a macroscopic control system
(laser, photodetector) via photons. Multistability is realized in the form of local-
ized charge-transfer excitations through a hierarchical set-up of the system. Due
to localization selection rules, the charge-transfer character of the excitations in-
hibits the decay of the excitation and leads to vastly different time scales." On
the other hand, the charge-transfer is used to couple different subunits (cells) via
the Coulomb interaction. A specific attractor is prepared by means of a dissipative
switching process, induced by a coherent light field. Selectivity of the switching

processes is achieved in frequency space. Measurement of the attractors proceeds
via resonance fluorescence. Control is gained by conditional switching processes in a
coherent light field. Starting with the smallest functional unit, a cell, we will discuss
our model in some detail in the following. A more detailed description can be found
in Obermayerl°,11 and Teich.13,14,15
3.1 MINIMAL MODEL FOR DISSIPATIVE SWITCHING DYNAMICS:

3-LEVEL-SYSTEM (CELL)
The minimal model for a dissipative switching dynamics, i.e., the realization of
multistability (bistability in case of the minimal model) and a reliable preparation of
the attractors, is given by a 3-level-system (cell) with specific couplings (Figure 1):
the ground state 11) is not (or only weakly) coupled to the first excited state 12)
(spontaneous decay rate r21), but both states 11) and 12) are strongly coupled to
the excited state 13) (spontaneous decay rates r31 and r32). Thus, on a time scale
*21 < 1 < r31 , r32, the excited state 12) is metastable, whereas the excited state
13) is a transient state, since it decays into the states 11) and 12), which form the
attractors of the system on this time scale. The time scale spreading r21 < r3i ,r32
is the basis for the following dissipative switching dynamics: a cell is switched, e.g.,
from state 11) to 12) by applying a laser pulse of frequency w31 = (E3 —
where E1 is the energy of state 1i). State 11) will be destabilized by this pulse, with
state 12) remaining the only stable attractor of the system (as long as the laser
pulse is on). The cell is excited to the transient state 13), from which it can decay
spontaneously into state 12) or back into state 11) (where it will be re-exited again
as long as the laser pulse is on).
Since the switching dynamics is a stochastic process, it must be described in
terms of a transition probability15 which is determined by the intensity and du-
ration of the applied laser pulse. Selectivity in real space cannot be achieved (the
extension of a cell is much smaller than the wavelength of the applied laser pulse),
so that the transition from state 12) to state 13) is always driven as well, albeit being
13>
W32
W31
FIGURE 1 Quantum optical
model for a single cell A. The
12> coupling between the states is
indicated by the "overlap" of the
11> V respective levels (cf. text)
TABLE 1 Machine table for a sin-

gle cell A.
11) 0 4031
12) 4032
off-resonant (back-reaction). Therefore, the transition probability approaches an

asymptotic value for long pulse durations which depends on the intensity of the
laser pulse and the selectivity in frequency space, i.e., on the difference between the
two transition frequencies 4031 and w32. The larger this difference is, the closer the
asymptotic transition probability approaches unity. In realistic systems the average
error probability, i.e., the difference between the transition probability and one,
might be as small as 10-10.15 The dissipative character of the switching dynamics
assures that the switching process is not sensitive to initial conditions and incorrect
light pulse parameters (frequency, intensity, pulse length).
For special initial conditions (i.e., starting from "pure" attractors 11) or 12)
and not from a coherent superposition of both) and when only one of the two
resonance frequencies con and tom is applied at a time, the dynamics of the cell
can be characterized by a machine table (Table 1). It gives initial and final states
and the frequency required to accomplish the desired switching process.
Each excitation of the cell is accompanied by a charge-transfer, i.e., each of
the three states 11), 12), and 13) has a different static dipole moment. This will be
important for the coupling of two cells and for the formation of a network of coupled
cells.
The measurement process is realized via resonance fluorescence. A fourth state
14), which is strongly coupled to the ground state 11), but which does not couple
to the metastable state 12)(r42 K r41 ) is added to a cell 14 Applying a laser pulse
of frequency 4041, the cell will only scatter photons of the same frequency if it is
in state 11). Being in state 12) no photons will be detected. Again, the basis for
the amplification of the quantum signal is the time scale spreading, r42, r21 < r41 ,
which allows to perform a repeated scattering experiment during the lifetime of the
metastable state 12), and in this way one arrives at a reliable predicate about the
quantum state of the system.
Possible realizations of such a three- or four-level system includes 3-D semi-
conductor heterostructures (double-quantum-dot model10,11) and complex macro-
molecules like donator-acceptor molecules. In either case, the basis for the rel-
evant subspectrum with the appropriate couplings is a hierarchical structure of
the system" which allows us to tailor selection rules via the localization of the
wavefunctions.11 An even smaller realization of such a 3-level system is given by
a single trapped ion.9 In this case the corresponding coupling between the states
is due to symmetry-based selection rules. However, the various states of a single
trapped ion do not possess different dipole moments and, therefore, cannot be cou-
pled via the dipole-dipole interaction as described below.
Applications include an array of uncoupled cells which might be used as an
information storage device, similar to persistent hole burning.8 Storage capacities
of up to 109 bits/cm2 might be achieved in this way.11
3.2 MINIMAL MODEL FOR CONDITIONAL SWITCHING DYNAMICS:

TWO COUPLED CELLS
Control of a quantum system can be demonstrated with the minimal model for a
conditional switching dynamics. We consider two cells A and B (Figure 2) with
four distinct transition frequencies c,)31,w32,w64 and w65, i.e., each transition can
be addressed selectively in frequency space (due to the small dimensions of the
system, selectivity in real space is not possible). The two cells are separated by
a distance R which is supposed to be large enough in order to assure localized
excitations within each cell. Multistability is then a fixed property of each cell. On
the other hand, R must be small enough in order for the two cells to interact via the
Coulomb interaction. The Coulomb interaction between the charge distributions of
the two cells leads to an energy renormalization which, to lowest order, is given
by the dipole-dipole interaction. Since ground state and excited states of each cell
have different dipole moments, the transition frequencies of each cell depend on the
state of the other cell. The splitting of the transition frequencies is given by13
PAPB
= h. w31(4) — h w31(5) = F(OA OB) (1)
41recoR3
and can be as large as 1 meV for a semiconductor heterostructure. w31(i) is the

conditional transition frequency between states 11) and 13) of cell A, if cell B is
in state li). PA and pn measure the magnitude of the change of the static dipole
moment between states 11) and 12) (cell A) or 14) and 15) (cell B), respectively, and
the angular factor F(0A,eB) results from the direction of the charge transfer in
cell A and B. F(0A,0,8)= 1 for a charge transfer parallel to the direction between
the two cells, i.e., the z-direction (cf. Figure 2). eco is the static dielectric constant
of the embedding material.
A conditional switching dynamics can be induced, e.g., in cell A, by applying
a laser pulse of frequency 4;31(4) and small enough bandwidth: cell A is switched
from state 11) to state 12) if cell B is in state 14) and remains stationary if cell B is
in state 15). The system can be switched selectively from state 11)14) to state 12)14).
All other states remain practically unaffected for large enough frequency selectivity
Ow. Similar as for a single cell, a reduced description of the dynamics in form of a
conditional machine table for each cell can be given (Table 2). Again, the machine
tables are only valid for purely local excitations of the system. Also, it is not allowed
13>
16>
It>
A R
FIGURE 2 Schematic drawing of two coupled cells A and B with different transition
frequencies, separated by a distance R.
to apply frequencies which can simultaneously lead to transitions in cell A and cell
B.
The machine table (Table 2) describes the possible control over the system. By
two simultaneous laser pulses with frequencies w31(4) and w31(5), it is possible to
prepare state 12) in cell A independent of the state of cell B. Similarly, all other
states can .be prepared independently. By applying a laser pulse with the single
frequency W32(4), the new state of cell A depends on the old states of cell A and
cell B. For suitable coding this mapping represents a logical "OR." Similarly, all
other elementary logical functions can be realized.13
3.3 MINIMAL MODEL FOR AN ADAPTIVE SYSTEM: 1-D CELLULAR

STRUCTURE
As an example for a network of coupled cells, we consider a linear arrangement of
alternating cells A and B (Figure 3). The A—B repeat unit is necessary in order
to achieve conditional dynamics with (left and right) nearest-neighbor coupling. If
a cell, e.g., A, is to be switched depending on the state of neighboring cells, these
neighboring cells must be passive during the finite switching time and must, there-
fore, be physically distinct of cell A, at least regarding the transition frequencies.
As for the case of two coupled cells, the conditional dynamics is realized via
state-dependent transition frequencies due to the dipole-dipole interaction (cf. Fig-
ure 4). Since only nearest-neighbor coupling is desired, the influence of all other
cells has to be compensated for. This can be achieved by a large enough bandwidth
TABLE 2 Conditional machine table for two coupled cells A and B.

cell A cell B
B 11) --* 12) (2) --). 11) A 14)-- 15) 15) ---). 14)
14) Low (4) w32(4) 11) w64(1 ) 4.765(1)
15) wsi (5) W32(5) 12) 4)64(2) w65(2)
6w of the laser pulse, which, however, must be smaller than the separation Ata
between different frequency bands. In this case the transition probability for each
cell depends only on the state of the adjacent cells. The required bandwidth can be
found to be13
6w ::-.2. 0.611w . (2)
In order to achieve distinct transition frequencies for each of the four possible
configurations of nearest neighbors, the frequency shift of the left and right neighbor
must be different. Due to the R73 dependence of the dipole-dipole interaction, this
can be realized by an asymmetric arrangement of the cells, i.e., the distance to the
left and the right neighbor is different. This symmetry breaking physically defines
a direction on the chain which, on the other hand, is necessary to get a directed
information flow in the system.
Since individual cells cannot be addressed selectively either in real space or
in frequency space, the preparation of the cellular structure must be performed
with the help of a shift operation. Starting from a physical inhomogeneity (i.e., a
cell D with transition frequencies distinct from cells A and B) any inhomogeneous
state can be prepared: a temporal pattern (successive preparations of cell D) is
transformed by successive shift operations into a spatial pattern of the cellular
structure (serial input).13 Similarly, the state of the 1-D cellular structure can be
measured.
41.•••••M•
Ai-1 1---- IN Ait 1

4C-- R2 ---><-- R1 ->
FIGURE 3 Real space model for a linear chain of alternating cells A and B. The
distances to the left and right neighbor are in general different (RI 0 R2)
= 14> ELI = 1.4 >, I3i = 14>

///////t
vs's t
F113 ei..1 =15>. a =14> ss. '///////, I 66)
= 15> 7//////,
3
B,.. = 15>. =15> V////////
C chain
FIGURE 4 Dependence of the transition frequency on the state of neighboring cells

due to the dipole-dipole interaction (of. text). The hatched area indicates the influence
of all but the adjacent cells.
If the angular dependence of the dipole-dipole interaction is exploited (i.e.,

the direction of the charge-transfer changes from one cell to the other), the ratio
between bandwidth Ow and frequency selectivity Aw can be reduced further. In this
case also a coupling between next nearest neighbors can be achieved. The transition
frequencies depend on the state of the nearest two left and right neighbors and the
unit cell has to be extended to three cell types A, B, and C.
For special initial conditions (direct product of localized states) and for a spe-
cific choice of the laser pulse sequences (i.e., cells A and B are not switched si-
multaneously), the dynamics of the linear chain can be characterized by a local
transition table for each cell type A and B (Table 3). Similarly as for CA,16 the
global evolution of the cellular structure is fixed in this way, and the final state
might not be deduced but by direct simulation of the dynamical evolution.
The dynamical evolution of the system adapts to various stimulations by its
environment (= laser pulse sequences). In this way the behavior of the 1-D cellular
structure varies from the deterministic dynamics of a 1-D unidirectional CA13 to
the stochastic dynamics of, e.g., the 1-D kinetic Ising mode1.16
TABLE 3 Local transition rules for cell types A

and B.
cell Ai
Bi-1 Eli ( 1) --4 12) 12) --' 11)
14) 14) W31(4,4) w32(4,4)
14) 15) "31(4,5) "32(4,5)
15) (4) w31(5,4) "32(5,4)
15) 15) w31(5,5) "32(5,5)
cell Bi
Ai Ai+1 14) --> IS) IS) --> 14)
11) 11) w64(1,1) W65(1,1)
11) 12) c‘,64(1,2) ca65(1,2)
12)11) (064(2,1) "65(2,1)
12) 12) "64(2,2) "65(2,2)
macroscopic length
1 1 I I _A F., I I I 1 15
--- A—B repeat unit

1- - -I 4
„.......,_,
cell width ------„.

3
,
,• •,
,
"quantum well" width
l' 1 i 1 ,.4 6, I I I I 'I 2
--7-- FIGURE 5 Five hierarchical levels.
atomic length . defined by various length scales of
1--- I1 the system.
SUMMARY
For the example of an optically controlled multistable quantum system, we have
demonstrated the connection between a complex hierarchical structure and the
complex dynamics of the system. Different length scales define various hierarchical
levels of the system (Figure 5). The number of hierarchical levels of the system (in
our case five) is a measure for the "homogeneous" complexity of the system. The
minimum number of five hierarchical levels is a prerequisite in order to realize mul-
tistability, preparation, measurement, and control, necessary to achieve a complex
dynamics which is equivalent to information processing. Neither a perfect crystal,
nor an ideal gas, which both possess only two hierarchical levels (a macroscopic
length scale and an atomic length scale), fulfill these requirements.
ACKNOWLEDGMENT
Financial support by the Deutsche Forschungsgemeinschaft (Sonderforschungsbere-
ich 329) is gratefully acknowledged.
REFERENCES
1. Blum, K. "Density Matrix Theory and Applications." New York: Plenum
Press, 1981, 63.
2. Carter, F. L., ed. "Molecular Electronic Devices." New York: Marcel Dekker,
1982.
3. Deutsch, D. "Quantum Theory, the Church-Turing Principle and the Univer-
sal Quantum Computer." Proc. R. Soc. London A 400 (1985):97.
4. Ferry, D. K., and W. Porod. "Interconnections and Architecture for Ensemble
of Microstructures." Superlatt. Microstruct. 2 (1986):41.
5. Feynman, R. P. "Quantum Mechanical Computers. Opt. News 11 (1985):11.
6. Landauer R. "Irreversibility and Heat Generation in the Computing Pro-
cess." IBM J. Res. and Dev. 5 (1961):183.
7. Landauer R. "Fundamental Limitations in the Computational Process." Berichtt
der Bunsen-Gesellschaft fir Physikalische Chemie 80 (1976):1041.
8. Moerner, W. E., ed. "Persistent Spectral Hole-Burning: Science and Applica-
tions." Berlin: Springer-Verlag, 1988.
9. Nagourney, W., J. Sandberg, and H. Dehmelt. "Shelved Optical Electron
Amplifier: Observation of Quantum Jumps." Phys. Rev. Lett. 56 (1986):2797.
10. Obermayer K., G. Mahler, and H. Haken. "Multistable Quantum Systems:
Information Processing at Microscopic Levels." Phys. Rev. Lett. 58 (1987):1792.
11. Obermayer K., W. G. Teich, and G. Mahler. "Structural Basis of Multista-
tionary Quantum Systems. I. Effective Single-Particle Dynamics." Phys. Rev.
B37 (1988):8096.
12. Peres, A. "Reversible Logic and Quantum Computers." Phys. Rev. A32
(1985):3266.
13. Teich, W. G., K. Obermayer, and G. Mahler. "Structural Basis of Multista-
tionary Quantum Systems. II. Effective Few-Particle Dynamics." Phys. Rev.
B37 (1988):8111.
14. Teich, W. G., G. Anders, and G. Mahler. "Transition Between Incompati-
ble Properties: A Dynamical Model for Quantum Measurement." Phys. Rev.
Lett. 62 (1989):1.
15. Teich, W. G., and G. Mahler. "Optically Controlled Multistability in Nanos-
tructured Semiconductors." Physica Scripta 40 (1989):688.
16. Wolfram, S. "Theory and Applications of Cellular Automata." Singapore:
World Scientific, 1986.
17. Zurek, W. H. "Reversibility and Stability of Information Processing Sys-
tems." Phys. Rev. Lett. 53 (1984):391.
Tommaso Toffoli
MIT Laboratory for Computer Science, Cambridge, MA 02139
How Cheap Can Mechanics' First Principles

Be?
It is fashionable today to explain various phenomenological laws as emer-

gent properties of appropriate collective behavior. Here, we argue that the
very fundamental principles of physics show symptoms of being emergent
properties, and thus beg for further reduction.
1. INTRODUCTION
One often speaks, generically, of 'the laws of physics.' The physicist, however, is
well aware that different kinds of laws have a different status, and according to
their status are meant to play a different role in both theory and applications. The
major status categories are roughly as follows.
n Analytical mechanics. Here we have physics' "constitution"—the principles of
classical mechanics, relativity, quantum mechanics. When we say, "Let's con-
sider a Hamiltonian of this form," we do not pretend that a physical system

302 Tommaso Toffoli
governed by such law actually exists; we merely imply that the law would not
be struck down by physics' supreme court as "unconstitutional:"
n Fundamental processes. Here we have those physical interactions that are actu-
ally observed and that presumedly belong to nature's most fundamental reper-
toire. They are the "op-codes" (using Margolus' metaphors) which the Supreme
Architect actually decided to include in physics' machine language." We ten-
tatively assume that, as in the design of a computer chip, other choices of
op-codes were potentially available and could have been equally effective.
Of course, some "grand unified theory" may later show that what appeared to
be independent choices at the op-code level are actually forced consequences
of a single master choice. Moreover, we may realize that what we thought
was a primitive op-code is actually implemented as a higher-level construct—a
"subroutine call." But we are all familiar with this kind of issues from experience
with man-made worlds.111
n Statistical mechanics. Here we have laws that emerge out of the collective
behavior of a large number of elements. The quantities involved in these laws
may not even be meaningful for individual systems or experiments. Intuitively,
one may expect that almost every detail of the microscopic interactions will
be washed out by macroscopic averaging; only features that are supported by
a definite conspiracy (such as a particular symmetry or conservation law) will
bubble up all the way to the macroscopic surface and emerge as recognizable
statistical laws.
In the past few decades, an enormous range of complex physical phenomena
have been successfully explained as inexorable statistical-mechanical consequences
of known fundamental processes or plausible stylizations thereof. Without doubt,
the reduction of phenomenology to fundamental processes via statistical mechanics
is today one of the most productive paradigms (cf. Kuhn4) of mathematical physics.
Explaining the texture of mayonnaise has become a likely subject for ten articles
in Physical Review, and no one would be surprised if its mathematics turned out
to be isomorphic to that needed to explain the fine structure of quarks.
This work on collective phenomena has revealed principles that appear to have a
universal and fundamental character not unlike that of the principles of mechanics.
In this paper, we shall turn the tables and ask, "Are perhaps the very principles
of mechanics so universal and fundamental just because they are emergent aspects
of an extremely fine-grained underlying structure, and thus chiefly mathematical
rather than physical in contents?"
A coin, no matter what its composition, shape, or tossing technique, can be
characterized by a single real parameter k such that over a large number of trials
111The choice of op•codes for, say, the IBM/360 family of computers reveals strong constraints of
economy, consistency, and completeness. And in the cheapest models of this family many of the
op-codes documented in the machine-language manual are emulated by software traps rather than
directly implemented in hardware; the timing may be different, but the logic is identical.
How Cheap Can Mechanics' First Principles Be? 303
it will come up heads close to a fraction k of the time. The existence of such a
parameter is not usually regarded as a property of our physical world per se—a
choice made by God when establishing the laws of nature; rather, it is seen as
a mathematical consequence of almost any choice about physics God could have
made.
In the same vein, one would like to ask whether, for instance, the laws of
mechanics are symplectic because God explicitly decided to make them so, or
whether this symplectic character automatically follows out of virtually any reason-
able choice of fine-grained first principles. Similarly, can one think of simple ground
rules for physics whereby relativity would appear as little surprising as the law of
large numbers?
In this paper, we shall give some circumstantial evidence that questions of the
above kind are scientifically legitimate and intellectually rewarding. Namely, we'll
look at a number of physical concepts that are usually regarded as primitive, and
in each case we'll show a plausible route for reduction to much simpler concepts.
2. CONTINUITY
Both in the classical and the quantum description, the state of a physical sys-
tem evolves as a continuous function of time. In mathematics, it is well known that
(a) (b)
A. y p(x,0)
• • • • • • • 0 •
• • • • • • • • •
• • • • • • • • •
• . • • • •
• • • 1—
• • • • • • • • •
• , • • • • • • •
0
X 0 1 2 3 4 5 6 x
• • d • • • • • •
• • • • • • • • •
FIGURE 1 (a) Particles on a lattice. (b) Density plot along a line y = const.
304 Tommaso Toffoli
(a) (b)
A y
• • • . • • • • •
• • • • • • • • •
• • • * 0 • • •
S.
•
• • • • • • • •
• o • • • • • • •
•
• ••
•••• 1.0
•• • • • • •
•
• • INF
• • • I • • • • • •
lwan.m .....
• • • • • • • • •
FIGURE 2 (a) A 3 x 3 window centered on x, y. (o) Average-density plot, as the

center of the window moves along a line y = const.
certain discrete constructs (e.g., the distribution of prime numbers, ir(n)) can be
approximated by continuous ones (in this example, the Riemann function R(x)).
However, continuity does not invariably emerge from discreteness through some
universal and well-understood mechanism, so that, when it does, we are justified
in asking why. Once we understand the reasons in one case, we may hope to derive
sufficient conditions for its emergence in a more general situation. Here we'll give
an example of sufficient conditions in a kinematical context.
Consider an indefinitely extended two-dimensional lattice of spacing A, having
a 1 ("particle") or a 0 ("vacuum") at each site, as in Figure 1(a). As we move, say,
along the x axis, the microscopic density function p(x, y) will display the discon-
tinuous behavior of Figure 1(b).
Let us define a whole sequence A, of new density functions, with pn(x, y) de-
noting the average density over the square window of side nA centered at x, y.
For example, pa can take any of the 10 values 0,1/9, 2/9, ... , 8/9,1 (Figure 2(a)).
However, as x increments by A—and the corresponding window slides one lattice
position to the right—pa cannot arbitrarily jump between any two of these values;
the maximum size of a jump is 1/3 (Figure 2(b)).
In general, while pn depends on the number of particles contained in the entire
window (volume effect), the change Apn corresponding to Az = ±A depends only
on the number of particles swept by the edge of the window (surface effect); thus,
IAPn I < 22
n-i= • (1)
and
lim Ap„ = 0 . (2)
n-•oo
If now we let the lattice spacing A decrease in the same proportion as n increases (so
that the area of the window remains constant), in the limit as n --> oo the sequence
Pn converges to a uniformly continuous function of x, y.
The above considerations involving a static configuration of particles are trivial.
Now, let us introduce an arbitrary discrete dynamics (r will denote the time spacing
between consecutive states), subject only to the following constraints:
n Locality. The state of a site at time t + r depends only on the state at time t
of the neighboring sites.
n Particle conservation. The total number of particles is strictly conserved.
In one time step, only particles that are lying next to the window's border can
move in or out of the window: much as that on x, the dependency of pn on t as well
is a surface effect. If in taking the above limit we let the time spacing r shrink in
the same proportion as the lattice spacing A, so as to leave the "speed of light" (one
site per step) constant, the sequence pn(r, y; t) converges to a uniformly continuous
function oft.
Remark that, if either locality or particle conservation did not hold, pn as
a function of time would not, in general, converge to a definite limit. Thus, we
have characterized a situation where the emergence of a continuous dynamics is
reducible to certain general properties of a (conceptually much simpler) underlying
fine-grained dynamics.
Is that the store where physics buys continuity? Who knows—but Occam would
say it's a good bet!
3. VARIATIONAL PRINCIPLES
In order to explicitly construct the evolution of an arbitrary dynamical system over
an indefinitely long stretch of time one needs laws in vectorial form. In the time-
discrete case, a vectorial law gives the next state ut4.1 of the system as a function
of the current state ut ,
u t +1 = Put; (3)
though in many cases of interest F can be captured by a more concise algorithm,
full generality demands that F be given as an exhaustive lookup table, since its
values for different values of u can in principle be completely arbitrary.
In the continuous case, a vectorial law gives the rate of change of the current
state u(t),
dt u = fu(t); (4)
306 Tommaso Toffoli
where f can be thought of as a lookup table having a continuum of entries rather

than a discrete set of entries.
Vectorial laws of the form (3) or (4) are very general, and can beused to de-
scribe systems that have nothing to do with physics. Only a subset of such laws,
characterized by particular constraints on the form of F or f, will describe admissi-
ble physical systems. Thus, as long as we restrict our attention to physical systems,
the lookup tables used in Eqs. (3) or (4) have less than maximal algorithmic en-
tropy, and can in principle be compressed. For example, in a Hamiltonian system
with one degree of freedom, the state can be written as an ordered pair u = (q, p)
in such a way that, instead of two lookup tables f and g, as in the hypothetical
general case
Id
dt q = f (q, P)
(5)
p = g(q,p)
dt
one only needs a single lookup table H, as in the well-known Hamilton equations
d
Id
Ti q = 3;H(q,P)
d d ' (6)
-d—t p = —7.7-H(g,p)
This compression factor of 2 (of 2n, for n degrees of freedom) is attained at a cost.
To obtain the current value of dq/dt, it is no longer enough to look at a single entry
of a table, as in Eq. (5); in fact, one has to determine the trend of H for variations
of p in the vicinity of the current value of (q, p), and this entails looking up a whole
range of entries. Compressed data save memory space, it is true, but entail more
computational work.
3.1 T =dS/dE FOR ALMOST ALL SYSTEMS

In general, there appear to be strong constraints on the form of admissible physical
laws; these constraints are often best expressed by variational principles. Could it
be that the actual constraints are much weaker, and that the stronger constraints
that we see are the result of our way of perceiving these laws—perhaps through
heavy statistical averaging?
One of the simplest variational principles of physics is the relation T = dS/dE,
where T denotes period, E, energy, and S, action. Here we show that, for the most
general class of discrete, invertible dynamical systems, the typical class element
still obeys the same relation; yet in these systems the dynamics is defined by an
arbitrary permutation of the state set! One may wonder, then, whether this relation
appears in physics as a consequence of combinatorial principles of a general nature—
much as the law of large numbers—rather than as the expression of physics-specific
principles.
FIGURE 3 An energy variation, dE, and the corresponding action variation, dS.
In Newtonian mechanics, consider a conservative system with one degree of

freedom. Let T be the period of a given orbit of energy E, and dS the volume of
phase space swept when the energy of the orbit is varied by an amount dE (Figure
3). As is well known (cf. Arnoldl), these quantities obey the following relation
dS
T=—. (7 )
dE
Quantities analogous to T, dE, and dS can be defined for dynamical systems of a
much more general nature. Under what conditions will the above relation still hold?
We shall show that Eq. (7) is a statistical consequence of very weak assumptions
on the structure of the dynamics.
We shall consider the class XN consisting of all discrete systems having a finite
number N of states and an invertible but otherwise arbitrary dynamics. We may
assume that continuous quantities arise from discrete ones in the limit N oo. In
general, relation (7) will not hold for each individual system; however, if one looks
at the class as a whole, one may ask whether this relation holds approximately for
most systems of the class. Alternatively, one may ask whether this relation holds
for a suitably defined "average" system—treated as a representative of the whole
class.
A similar approach is widely used in statistical mechanics .M In our context,
though, statistical methods are applied to "ensembles" in which the missing in-
formation that characterizes the ensemble concerns a system's law rather than its
initial state.
121For example, in an ideal gas, almost all the systems in an ensemble at a given temperature
display a velocity distribution that is very dose to the Boltzmann distribution; the latter can
thus be taken as the "representative" distribution for the whole ensemble, even though hardly any
element of the ensemble need follow that distribution exactly.
Tommaso Toffoli
1 2 3 4 5 6 7 8 9 1011 • -
FIGURE 4 Orbit-length histogram.
The ensemble XN consists of N! systems—i.e., all possible permutations of N

states. Systems of this kind have very little structure; nonetheless, one can still
recognize in them the "precursors" of a few fundamental physical quantities. For
instance, the period T of an orbit is naturally identified with the number of states
that make up the orbit. Likewise, a volume S of state space will be measured in
terms of how many states it contains. It is a little harder to identify a meaningful
generalization of energy; the arguments presented in Section 3.2 suggest that in this
case the correct identification is E = log T, and this is the definition that we shall
use below.
Armed with the above "correspondence rules," we shall investigate the validity
of relation (7) for the ensemble XN.
Each system of XN will display a certain distribution of orbit lengths; that is,
one can draw a histogram showing, for T = 1, . . . , N, the number n(T) of orbits of
length T (cf. Figure 4). If in this histogram we move from abscissa T to T + dT we
will accumulate a count of n(T) dT orbits. Since each orbit contains T points, we
will sweep .an amount of state space equal to dS = T n(T) dT; thus
dS
dT = T n(T) . (8)
On the other hand, since E = log T,

dT T.
(9)
dE
hence
dS dS dT
= T2 n(T) ( 10)
dE dT dE
Therefore, the original relation (7) will hold if and only if the orbit-length
distribution is of the form
n(T) =1 (10)
Do the systems of XN display this distribution?

Observe that, as N grows, the number of systems in XN grows much faster
than the number of possible orbit-length distributions: most distributions will occur
many times, and certain distributions may appear with a much greater frequency
than others. Indeed, as N oo, almost all of the ensemble's elements will display
a similar distribution. In such circumstances, the "typical" distribution is just the
mean distribution over the ensemble, denoted by n(T).
It turns out that for XN the mean distribution is exactly
1
nN(T) = (12)
for any N.
In fact, we construct a specific orbit of length T by choosing T states out
of N and arranging them in a definite circular sequence. This can be done in
(N )TI/T
•
different ways. To know in how many elements of the ensemble the orbit
thus constructed occurs, we observe that the remaining N — T elements can be
connected in (N — T)! ways. Thus, the total number of orbits of length T found
anywhere in the ensemble is
1 NNT!
(13)
•(N T)! NI 'T
1•
Divide by the size N! of the ensemble to obtain 1/T.

Thus, the typical system of XN obeys relation (7). Intuitively, when N is large
enough to make a continuous treatment meaningful, the odds that a system picked
at random will appear to be governed by the variational principle T = dS/dE are
overwhelming.
3.2 WHY E n log T

Finite systems lack the rich topological structure of the state space found in ana-
lytical mechanics. Beside invertibility, in general the only intrinsicM structure that
they are left with is the following:
Given two points a and b, one can tell whether b can be reached from a in
t steps; in particular (for t = 0), one can tell whether or not a = b.
independent of the labeling of the points, and thus preserved by any isomorphism.
310 Tommaso Toffoli
Thus, for instance, one can tell how many orbits of period T are present, but of these
one cannot single out an individual one without actually pointing at it, because they
all "look the same."
To see whether there is a quantity that can be meaningfully called "energy"
in this context, let us observe that physical energy is a function E, defined on the
state space, having the following fundamental properties:
1. Conservation. E is constant on each orbit (though it may have the same value
on different orbits).
2. Additivity. The energy of a collection of weakly coupled system components
equals the sum of the energies of the individual components.
3. Generator of the dynamics. Given the constraints that characterize a particular
class of dynamical systems, knowledge of the function E allows one to uniquely
reconstruct, up to an isomorphism, the dynamics of an individual system of
that class.
The proposed identification E = log T obviously satisfies property 1.
As for property 2, consider a finite system consisting of two independent compo-
nents, and let 00 and al be the respective states of these two components. Suppose
for a moment that ao is on an orbit of period 3, and al on one of period 7; then the
overall system state (ao, al) is on an orbit of length 21, i.e., log T = log To +10g Ti .
This argument would fail if To and T1 were not coprime. However, for randomly cho-
sen integers the expected number of common factors grows extremely slowly with
the size of the integers themselves,7 so that approximate additivity holds almost
always.
As for property 3, an individual system of XN is completely identified—up to
an isomorphism—by its distribution n(T), and thus any "into" function of T (in
particular, E = log T) satisfies this property.
Note that the ensemble XN consists of all invertible systems on a state space
of size N. If we placed further constraints on the make-up of the ensemble, i.e.,
if we restricted our attention to a subset of systems having additional structure,
some of the above arguments may cease to be valid. For example, while it is true
that for large N almost all subensembles of XN retain distribution (Eq. (12)), in
a few "perverse" cases the distribution will substantially depart from 1/T, and, if
we still assume that E = log T, Eq. (7) may fail to hold. Moreover, systems that
were isomorphic within XN may no longer be so when more structure is introduced;
to allow us to tell that two systems are intrinsically different, the energy function
may have to be "taught" to make finer distinctions between states than just on the
basis of orbit length. But all this is besides the point we are making here; a fuller
discussion of these issues will be found in Toffoli.1°
3.3 CONCLUSIONS
The fact that a specific variational principle of mechanics emerges quite naturally,
via statistical averaging, from very weak information-mechanical assumptions, does
not tell us much about what fine-grained structure, if any, may actually underlie
traditional physics; the relevant point is that we come to recognize that such a
principle happens to be of the right form to be an emergent feature. When we see
a Gaussian distribution in a sequence of heads and tails, we can't really tell what
coin is being tossed, but conceptual economy will make us guess that somebody is
tossing some kind of coin, rather than concocting the sequence by explicit use of
the Gaussian function.
4. RELATIVITY
The fact that the physics of flat spacetime is Lorentz, rather than Galilean, invariant
is usually treated as an independent postulate of physics, much as Euclid's fifth
axiom in geometry. In other words, God could have chosen differently; Lorentz
invariance has to be acknowledged, not derived.
However, if we look at the most naive models of distributed computation, we
see that Lorentz invariance naturally emerges as a statistical feature, and admits
of a very intuitive information-mechanical interpretation. Much as in the previous
section, we do not want to claim that this is the way relativity comes about in
nature; we just want to stress that the mathematics of relativity happens to lie in
one of those universality classes that arise from collective phenomena.
4.1 ORIENTATION
Consider the two-dimensional random walk on the x, y lattice. At the microscopic
level, this dynamics is not rotation invariant (except for multiples of a quarter-turn
rotation); however, invariance under the continuous group of rotations emerges at
the macroscopic level (Fig. 5). In fact, for r2 = z2 + y2 < t and in the limit as
t cc), the probability distribution P(x,y;t) for a particle started at the origin
converges to
(14)
2irte3 t
i.e., depends on x and y only through x2 + y2 = r2.
Now, there is a strict formal analogy between a circular rotation by an angle 0
in the x, y plane and a Lorentz transformation with velocity 13 in the t, x plane—
which can be written as a hyperbolic rotation by a rapidity B = tanh-1 13:
rt r cosh 0 sinh 01 r
xj (15)
smh cosh L
312 Tommaso Toffoli
FIGURE 5 In the two-dimensional random walk on a lattice, circular symmetry

naturally emerges at the macroscopic level out of a square-symmetry microscopic law.
Riding on this analogy, one may hope to find a microscopic dynamics on the t, x
lattice for which Lorentz invariance (which is out of the question at the microscopic
level) would emerge at the macroscopic level.
Let's look first at the one-dimensional random walk on a lattice, with probabil-
ity p of moving to the right and q = 1—p of moving to the left. For p = q = 1/2, the
evolution of the resulting binomial distribution is characterized, macroscopically, by
a mean p = 0 and a standard deviation o = 011 (Figure 6(a)).
In general, p = (p — q) t . If we shift the parameter p away from its center value
of 1/2, the center of mass of the distribution will start moving at a uniform velocity
fl = p— q. Let's try to offset this motion by a Galilean transformation
x = x' + fit' . (16)

How Cheap Can Mechanics' First Principles Se?
(a) (b)
FIGURE 6 (a) Symmetric random walk (p = 1/2). (b) Asymmetric random walk
(p = 3/4); note that, as the center of mass picks up a speed /3 = p — q, the rate of
spread goes down by a factor 1 — )32.
Macroscopically, the new system will evolve, in the new frame, just as the old
system did in the old frame—except that now o = N5Ft = V(1— 132)t/4, so that
the diffusion will appear to have slowed down by a factor 1— fi'2 (Fig. 6(b)).
Intuitively, as some of the resources of the "random walk computer" are shifted
toward producing coherent macroscopic motion (uniform motion of the center
of mass), fewer resources will remain available for the task of producing inco-
herent motion (diffusion). Thus, we get a slowdown reminiscent of the Lorentz-
Fitzgerald "time expansion." In the present situation, however, the slowdown factor
is 1— (32, related to, but different from, the well-known relativistic factorVi.----73/;
the transformation that will restore invariance of the dynamics in this case is
a Lorentz transformation followed by a scaling of both axes by a further factor
FL 73 3
4.2 A LORENTZ-INVARIANT MODEL OF DIFFUSION
In the above example, when we tried to offset by a Galilean coordinate transforma-

tion the consequences of a transformation of the dynamical parameter p, we noticed
that "proper time," as measured by o, was not independent of p. Time as well as
space needed to be transformed in order to have the desired dynamical invariance.
However, the appropriate transformation was not simply a Lorentz transformation.
But neither were we following the standard procedures of relativity. Fact is,
with dynamical parameters we are barking up the wrong tree. What relativity says
is that a certain kind of transformation of the spacetime coordinates (Lorentz trans-
formation) can always be offset by an appropriate transformation of the dynamical
variables. We shall now present a lattice-gas model of diffusion that has the same
macroscopic phenomenology as the random walk, but is microscopically determinis-
tic and reversible. Unlike the random walk, changes in the macroscopic parameters
314 Tommaso Toffoli
p and a arise in this model from changes in the initial distribution of microscopic
states, rather than by tampering with the microscopic laws. This model is exactly
Lorentz invariant in the continuum limit, i.e., as the lattice spacing A goes to zero.
Let us consider a one-dimensional cellular automaton having the format of
Fig. 7(a). This is a regular spacetime lattice, with a given spacing A (lattice units
per meter). The arcs represent signals traveling at unit speed (the "speed of light");
the nodes represent events, i.e., interactions between signals. If one of the possible
signal states, denoted by the symbol 0, is interpreted as denoting the vacuum, the
remaining states can be interpreted as particles traveling on fixed spacetime tracks
(the arcs) and interacting only at certain discrete loci (the nodes). Such a system
can be thought of as a lattice gas (cf. Hardy et al.3 and Toffoli and Margolus8).
Here, we will allow no more than one particle on each track. When two particles
collide, each reverses its direction (Fig. 7(b)). As long as particles are identical (say,
all black), this reversal is indistinguishable from no interaction (Fig. 7(c)).
Now let us paint just one particle red (in which case the reversal does make
a difference), and study the evolution of its probability distribution p(x; t) when
both right- and left-going particles are uniformly and independently distributed
with linear density (particles per meter) s = n/A—where n is the lattice occupation
density (particles per track).
(a) (b)
X
(c)
FIGURE 7 (a) One-dimensional lattice, unfolded over time. The tracks, with slope ±1,
indicate potential particle paths; the nodes indicate potential collision loci. (b) Bouncing
collision. (c) No-interaction collision.
For fixed s, as A -+ 0 (continuum limit), p(x; t) converges to the solution of the

telegrapher's equation
attp = arzp — 2s8: p (17)
The latter distribution, in turn, converges to the solution of the diffusion equation
1
atP (18)
2s
in the same circumstances (i.e., t -.4 co, Izi < Vt.) as the binomial distribution does.
We shall now introduce the freedom to independently vary the densities s+ , s_ of,
respectively, right- and left-going particles; as a consequence, the red particle's
distribution's center of mass will drift, and its diffusion rate will be affected, too—
much as in the asymmetric random walk case. However, this time we have strict
Lorentz invariance (in the continuum limit): to every Lorentz transformation of the
coordinates, t, x 1-4 t', x', there corresponds a similar linear transformation of the
initial conditions, 3+, s_ i s'+, s' , that leaves the form of p invariant. (Indeed, the
telegrapher's equation is just another form of the Klein-Gordon equation used in
relativistic quantum mechanics.)
Lorentz invariance emerges in a similar way for a much more general class of
dynamics on a lattice, as explained in Toffoli9; more generally, features qualitatively
similar to those of special relativity appear whenever fixed computational resources
have to be apportioned between producing the inertial motion of a macroscopic
object as a whole and producing the internal evolution of the object itself (cf.
Chopard2). Thus, we conjecture that special relativity may ultimately be derived
from a simpler and more fundamental principle of conservation of computational
resources.
4.3 GENERAL RELATIVITY

The spacetime lattice in which the particles of the above example move and in-
teract can be thought of as a uniform combinational network—the simplest kind
of parallel computer. Recall, however, that Lorentz invariance was achieved in the
limit of a vanishingly fine lattice spacing, while holding the density (particles per
meter) constant. In this limit, then, the occupation number (particles per track)
goes to zero; this corresponds to a vanishing utilization of the network's computing
resources. By the time Lorentz invariance emerges, the model has become useless
as a numerical computation scheme.
In an attempt to trade accuracy for computational efficiency, suppose we start
backing up a little from limit, i.e., we consider a network with a fine, but not in-
finitely fine, spacing. As the network becomes coarser, the number of tracks avail-
able to the same number of particles decreases, and thus the occupation number
increases. When this number significantly departs from zero, the macroscopic dy-
namics will start deviating from special relativity.
316 Tommaso Toffoli
Is this really an unfortunate state of affairs? After all, we know that physics
itself starts deviating from special relativity when one dumps more and more matter
in the same volume. Are we witnessing the emergence of general relativity? Indeed,
the slowdown of the macroscopic evolution brought about, in models of the above
kind, by the "crowding" of the computational pathways, is strikingly analogous
to the proper-time dilation that, in physics, is brought about by the gravitational
potential.
Without more comprehensive models, precise interpretation rules, and quan-
titative results, any claims that the present approach might have anything to do
with modeling general relativity is, of course, premature. But it is legitimate to ask
whether fine-grained computation in uniform networks has at least the right kind
of internal resources for the task. In other words, is the emergence plausible, in
such systems, of a dynamics of spacetime analogous to that described by general
relativity? And how could it come about?
Let us start with a metaphor. On a strip of blank punch tape we can record
information at a density of, say, ten characters per inch. What if we could only avail
ourselves of used tape, found in somebody's wastebasket? Knowing the statistics of
the previous usage, one can devise appropriate group encoding techniques and error
correcting codes so as to make such a used tape perfectly adequate for recording
new information (cf. Rivest and Shamire)—at a lower density, of course, i.e., up
to the maximum density allowed by Shannon's theorems for a noisy channel. The
proper length of the tape, defined in terms of how many characters we can record on
it, will be less than that of blank tape, by a factor that will depend on how heavy
the original usage was. If the tape is sufficiently long, its statistical properties may
significantly vary from place to place, and we may want to adapt our encoding
strategy to the local statistics—yielding a proper-length metric that varies from
place to place.
Let us extend the above metaphor from the domain of information statics to
that of information dynamics. Consider, for example, a programmable gate array
having a nominal capacity of, say, 10,000 gates. An inventor designs a clever arcade
game that takes full advantage of the chip's "computing capacity," and asks the
VLSI factory to produce a million copies of it. The game turns out to be a flop, and
the programmed chips get thrown in the waste basket. What is the effective "com-
puting capacity" of these chips from the viewpoint of the penniless but undaunted
hacker that finds them? How many of these chips would he have to put together in
order to construct his own arcade game, and how many clock cycles of the original
chip would he have to string together to achieve a usable clock cycle for his game?
What in the new game is simply the toggling of a flip-flop may correspond, in the
underlying original game, to the destruction of a stellar empire. For the new user,
proper time will be measured in terms of how fast the evolution of his game can be
made to proceed.
For a macroscopic scavenger, the individual hole positions in a punched tape or
the individual gates in an electronic circuit blend into a continuum, locally charac-
terized by a certain effective density of information-storage capacity and a certain
effective density of information-processing capacity. These densities reflect the con-

straints that the local "degree of congestion" of the computing resources sets on any
"further incremental usage" of these resources. Thus, if length and time measure,
respectively, the effective information-storage and -processing capacities available to
macroscopic epiphenomena, a metric and a dynamics of curved spacetime naturally
emerge out of a flat, uniform computing network.
4.4 CONCLUSIONS
Quantitative features of special relativity and at least qualitative features of general
relativity emerge quite naturally as epiphenomena of very simple computing net-
works. Thus, relativity appears to be of the right form to be an emergent property,
whether or not that is the way it. comes about in physics.
5. GENERAL CONCLUSIONS
Many of what are regarded as the most fundamental feature of physics happen
to have the right form to be emergent features of a much simpler fine-grained
dyna.mics.141
A century and a half ago, most people were happy with the idea that the cell
was a bag of undifferentiated "protoplasm" governed by some irreducible "vital
force." The behavior of the cell was obviously very rich, but few people dared to
ascribe it to much finer-grained internal machinery, explicitly built according to
immensely detailed blueprints.
Today we know for sure about the existence of such machinery and such
blueprints. Besides molecular genetics, chemistry and nuclear physics provide fur-
ther case histories where complex behavior was successfully reduced to simpler
primitives on a grain a few orders of magnitude finer.
For a physicist, the possibility of explanation by reduction to simpler, smaller
structures is of course one of the first things that comes to mind. The point of
this paper is that one should look for such possibility not only to explain specific
phenomenology, but also to re-examine those general principles that are so familiar
that no "explanation" seems to be needed.
NEven invertibility —perhaps the most strongly held feature of microscopic physics—can quite
naturally emerge out of an underlying noninvertible dynamics. We are going to discuss this topic
in a separate paper.
318 Tommaso Toffoli
ACKNOWLEDGMENTS
This research was supported in part by the Defense Advanced Research Projects
Agency (N00014-89-J-1988), and in part by the National Science Foundation
(8618002-IRI).
REFERENCES
1. Arnold, Vladimir. Mathematical Methods of Classical Mechanics. Berlin:
Springer-Verlag, 1978.
2. Chopard, Bastien. "A Cellular Automata Model of Large-Scale Moving Ob-
jects." Submitted to J. Phys. A (1989).
3. Hardy, J., 0. de Pazzis, and Yves Pomeau. "Molecular Dynamics of a Clas-
sical Lattice Gas: Transport Properties and Time Correlation Functions."
Phys. Rev. A13 (1976):1949-1960.
4. Kuhn, Thomas. The Structure of Scientific Revolutions, 2nd edition.
Chicago: Univ. of Chicago Press, 1970.
5. Margolus, Norman. "Physics and Computation" Ph.D. Thesis, Tech. Rep.
MIT/LCS/TR-415, MIT Laboratory for Computer Science, 1988.
6. Rivest, Ronald, and Adi Shamir. "How to Reuse a `Write-Once' Memory."
Info. and Control 55 (1982):1-19.
7. Schroeder, Manfred. Number Theory in Science and Communication, 2nd en-
larged edition. Berlin: Springer-Verlag, 1986.
8. Toffoli, Tommaso, and Norman Margolus. Cellular Automata Machines—A
New Environment for Modeling. Cambridge: MIT Press, 1987.
9. Toffoli, Tommaso. "Four Topics in Lattice Gases: Ergodicity; Relativity; In-
formation Glow; and Rule Compression for Parallel Lattice-Gas Machines."
In Discrete Kinetic Theory, Lattice Gas Dynamics and Foundations of Hydro-
dynamics, edited by R. Monaco. Singapore: World Scientific, 1989, 343-354.
10. Toffoli, Tommaso. "Analytical Mechanics from Statistics: T = dS/dE Holds
for Almost Any System." Tech. Memo MIT/LCS/TM-407, MIT Laboratory
for Computer Science, August 1989.
Xiao-Jing Wang
Center for Studies in Statistical Mechanics, University of Texas, Austin, TX 78712 (current
address: Mathematical Research Branch, NIDDK, National Institutes of Health, Bldg. 31,
Room 4B-54, Bethesda, MD 20892, USA)
Intermittent Fluctuations and Complexity
I. INTRODUCTION
We shall summarize here succinctly some recent progress in our understanding of
intermittent phenomena in physics. Intermittency often refers to random, strong
deviations from regular or smooth behavior. Consider, for instance, an iterative
dynamical system (Figure 1)
zn+i = f(Xn) = xn xzn (mod 1) . (1)
For z = 3, if we start with an initial condition xo = 0.001, then rg = 10-9, and

the system would remain near the origin for millions of time units, before suddenly
turning into a burst of irregular oscillations with considerable amplitude. In such
a temporal evolution with long quiescent periods spontaneously interspersed by
random events, an observable can be "almost surely constant in every prescribed
finite span of time," as Mandelbrot12 once put it, "but it almost surely varies
sometime."

320 Xiao-Jing Wang
Xn+1
Xn
0 1
... A3 A2 Al A0
FIGURE 1 The Manneville-Pomeau map, with a countable partition of the phase space
(0,1).
Equation (1) is called the Manneville-Pomeau map,2 at the transition point

from a periodic state (the fixed point z = 0) to a chaotic state. It played an
important role in the study of the onset of turbulence. To describe this intermittent
dynamics, two quantities are mostly relevant: one is the fraction of time during
which the output is irregular ("turbulent") or when the signal is larger than a
threshold, say, x E Ao = (c, 1) with 1 = c + e. Thus, the "turbulent time" may be
related to the number Nn of recurrences to the cell Ao during a time span n, and
the "laminar time" is n — Nn.
The other quantity (perhaps even more important) is the Liapounov exponent
given as
E log If(f k(x))1
33--1
A = lim An with An(z) = • (2)
n co n
k=0
Thus, if .\ > 0, there is an exponential sensitivity to initial conditions
bxn = j II( xk )16x0 = e bx o • (3)

k=0
and the behavior is then said to be chaotic. An entropy is also well defined for
dynamical systems, thanks to A. Kolmogorov and Y. Sinai.6 The idea is that a
deterministic chaotic system admits a discrete generating partition of its phase
Intermittent Fluctuations and Complexity 321
space, resulting in an exactly equivalent stochastic process with discrete (finite

or denumerable) states. The Kolmogorov-Sinai entropy hKs is then equal to the
Shannon entropy per time unit of the associated stochastic process. For chaotic
attractors in one-dimensional mappings, hKs coincides with A.
Now, for intermittent cases, the regular (or "laminar") phases may be so pre-
vailing that the irregular (or "turbulent") oscillations could occur only in a subset
of the time axis with zero measure. This may happen if
A. •-• nv 0<v<1. (4)
which implies An /72 0, and the dynamic stability is stretched exponential rather
than exponential. This kind of behavior was called "sporadic."8 We shall show that
it represents a special class of intermittent systems, with the algorithmic complexity
of Kolmogorov and Chaitin of a form intermediate to predictable and random cases.
A fully disordered or random system is sometimes perceived as simple rather
than complex, mostly when its fluctuations appear small and inconspicuous. On the
other hand, the intermittency is generally characterized by abnormal fluctuations
and 1/f-noise-like power spectrum,1'9 and local fluctuations around the mean of
an observable may obey a Levy, rather than a Gauss, distribution. More complete
information is provided by knowledge about large fluctuations, using the thermo-
dynamic formalism of Sinai, Ruelle and Bowen for the dynamical systems.16 The
SRB theory furnished a rigorous connection between dynamical systems and equi-
librium statistical mechanics in one dimension (on the time axis). In this framework
abnormal behaviors of large deviations in Eq. (1) are to be treated as a problem of
phase transition in its statistical mechanical counterpart.2°,21
To sum up, one can look at the mean, local fluctuations, large deviations, and
equilibrium statistical mechanics, in order to achieve increasingly detailed descrip-
tions of intermittent systems. Furthermore, one can also study its unusual nonequi-
librium properties, following a suggestion of G. Nicolis, et al.14,15 In what follows
we shall take Eq. (1) for a case study to illustrate how each of these levels of con-
sideration leads to insights into new aspects of fluctuations and complexity of the
intermittent processes. Sections are devoted to the equilibrium properties of
Eq. (1); and in Section IV nonequilibrium is discussed. Some other examples with
close analogy to the intermittent system will be mentioned in Section V, including
a discrete one-dimensional model of Anderson localization in disordered matter.
Initially observed in fluid turbulence, intermittency has more recently been
evidenced in other physical systems as diverse as the large-scale structure of the
universe and the hadronic multiparticle production in high-energy physics. Little is
known, as yet, beyond the phenomenology of these spatially extended processes.
322 Xiao-Jing Wang
II. ALGORITHMIC COMPLEXITY OF SPORADIC BEHAVIOR

This section is largely based on Gaspard and Wang.8 Let
S = (sosis2s3 ...) (5)

be a string of integers or symbols. The algorithmic complexity4,11 K(Sn ) of the
string S„ composed of the n first symbols of Eq. (5) is defined as the binary length
of the shortest possible program able to reconstruct the string Sn on a universal
machine. Thus, any sequence that admits a finite program to generate it can be
reproduced by specifying this program and the length n of the sequence, so that
K(Sn ) ••••• log2 n . (6)
This "predictable" class of strings includes periodic ones (for which it suffices to
specify the pattern of one period in the program), as well as a large set of aperiodic
ones. An often-cited example is the digital expansion of the mathematical constant
Pi, r = 3.141592... for which a convergent series representation exists (e.g., in
the sum of S. Ramanujan's series each successive term adds roughly eight correct
digits).
On the other hand, for a random sequence where no regularity can be found,
one could only "copy" the whole string, bit by bit, so that
K(Sn ) •••• n (7)

In chaotic systems with a positive entropy hits, almost all trajectories are random,
in the sense that
1
lim —K(Sn ) = /tics > 0 . (8)
n
To estimate the algorithmic complexity for the intermittent system (1), we
observe first that with the countable partition of (0,1) as shown in Figure 1, the
system can be approximated by a Markov chain with denumerable states. The
transition matrix takes the form
Poo P01 P02 P03
1 0 0 0
W=
( 0 1 0 0 (9)
0 0 1 0
• • )
with the transition probability pon , and the invariant measure pn , fulfilling
1 1
Pon ^'n—on ni+a; µ(An) •-.n (10)
na •
where a = 1/(z — 1).

An orbit from this process would be

S = 543210210087654321032109876 . (11)
Because of the predictable nature of the "laminar" phases, S is uniquely recovered
from a shorter string
R = 520839... (12)
which consists solely of the symbols immediately after each "turbulent" state (the
state Ao). S is therefore compressible. Assume that a finite string of length n has
Nn recurrences to An , then the complexity of 5,, may be estimated as
N,,,
K(Sn ) = E
i=i
loge ski (13)
with ski_, = 0. Using the theory of recurrent events, one can then show that
n 2/3 < z < 2;
/if
E(K (Sn )) ••••• nTIT if 2 < z; (14)
io
n if z = 2.
Therefore, when 2 < z, the behavior is intermediate between the predictable

and random cases, in the sense that the algorithmic complexity is
Kn •-•• nvo(log n)" with 0 < vo < 1; or vo = 1, vi < 0 . (15)
Although Eq. (15) seems to betoken a continuous spectrum with periodicity
(vo = 0) and randomness (vo = 1) at the two extremities, these sporadic behaviors
stand in qualitative contrast with both totally ordered and completely disordered
ones.
HI. ABNORMAL FLUCTUATIONS AND PHASE TRANSITION A

LA FISHER
A prominent property of the intermittent dynamics in Eq. (1) is that long "laminar"
phases entail a long-range time correlation in power law,20,21
f n—(a-1), if 1 < z < 2 (1 < a) (16)
41 ...... n-2(1—*), if 2 < z (0 < a < 1)
The fluctuations near the mean value of an observable may not be Gaussian,
e.g., the variance of fluctuations for the turbulent time Nn isg
n, if 1 < z < 2/3;
Var(Nn ) ...-./ n3— a, if 2/3 < z < 2; (17)
n2*, if 2 < z.
324 Mao-Jing Wang
For z < 2/3, there exists a central limit theorem asserting the Gaussian charac-
ter of the fluctuations. For 2/3 < z, on the other hand, they obey a generalized limit
theorem involving the Levy stable distribution gc,(x) with 0 < a < 2.13 (gc,.2(x) is
the familiar Gauss law. A Levy distribution with a < 2 enjoys a cognate genericity
as a Gauss distribution for sums of independent random variables with a common
distribution, only now this latter distribution has an infinite second moment.)
In both cases, the correlation function is a power law. This tells us that local
fluctuations are not sufficient to characterize the abnormality of the system. In fact,
a central limit theorem is concerned with fluctuations of the form
-1 n-1
E g(xk ) — E(g) < as ;
n k=0 /1
—oo < c < +co . (18a)
where E(g) stands for the mean value of an observable g(z). Instead of (18a), one
can consider large deviations, i.e.,
1 n-1
— Azk) E (A, A + dA) (18b)
k=0
for all possible values A, not necessarily near its mean value. If g(x) = logjr(z)1,
then the left side of Eq. (18b) tends to the Liapounov exponent .A, and one is dealing
with
n-1
Un(SoSiS2 • • -Sn-1) = if
n E
,
z€4 ,0318 2-sn-i) k=0
in.f (k)(0))i • (19)
where /(sosis2... sn_i) is the cell in the phase space coded by (s05132 • • • sa-1).
A fundamental result in the SRB theory states that the invariant measure of a
dynamical system is given by
Prob(sosis2. • . sn-1) ••••• exp(—Un(SOS1S2 - • Sn-1)) (20)
which takes a similar form as a Gibbs state in equilibrium statistical mechanics,

with the inverse of a temperature 19= 1. This expression is to be especially appreci-
ated because for dissipative dynamical systems the invariant measure is unknown a
priori, unlike Hamiltonian systems. Following this line of thinking, one is led to seek
a mapping of the dynamical system into a statistical mechanical system (on a one-
dimensional lattice), with the Hamiltonian given by Eq. (19). A then plays the role
of energy, and its thermodynamic conjugate quantity is /3 (the formally introduced
parameter ,8 is interpreted physically in this way). Similarly, if one identifies the
"laminar" (respectively "turbulent") state with the presence (respectively absence)
of a particle on a given lattice site, n — N. represents the number of particles in a
large albeit finite lattice. Hence, a density and a chemical potential can be accord-
ingly introduced. The advantage of these identifications is to embed the problem of
large deviations into the framework of equilibrium statistical mechanics.
A detailed analysis of the intermittent system (1) (or rather a piecewise linear
approximation of it) is carried out in Wang 20'21 Let us call attention to some main
conclusions therefrom. It was found that its statistical mechanical counterpart bears
a close analogy to a droplet model of condensation proposed by Fisher? about 25
years ago. The clusters of "laminar" states are similar to the clusters of particles
(the droplets) in Fisher's model of gas-liquid phase transition; there are many-body
interactions within each cluster, which results in a surface energy of logarithmic
type.
On the pressure-temperature plane, there are two thermodynamic phases sep-
arated by a critical line. They are respectively the chaotic ("gas") and periodic
("condensed") states; and the intermittent state is located on the co-dimension one
critical curve of phase transition. This is true for all 1 < z, regardless of whether
the local fluctuations are Gaussian or not. Therefore, the abnormal large fluctua-
tions may be detected as a phase transition of the associated statistical mechanical
system. The identification of the interaction potential and the types of resulting
critical phenomena constitute a finest characterization and universal classification
of such intermittent dynamical systems.
IV. APPROACH TO EQUILIBRIUM IS NOT EXPONENTIAL

The previous sections are concerned mainly with stationary or equilibrium prop-
erties of the intermittent system. In the present section we shall mention briefly
certain nonequilibrium aspects of dynamical systems. The discussion is inspired by
the recent work of G. Nicolis and his collaborators14,15 in which a master equation
approach to deterministic chaos has been advanced. A central question to be ad-
dressed is this: given a nonequilibrium initial distribution of points on an attractor,
how will it converge to the equilibrium distribution (i.e., the invariant measure)?
In the case of our countable Markov model of intermittency, the answer is
surprisingly straightforward. Indeed, a theorem due to D.G. Kendall" states that,
for any irreducible and aperiodic Markov chain, the convergence is exponential if
and only if there is a state i, such that
00
filsn (21)
n=1
where denotes the probability of first recurrence at time n of the state i, has a
radius of convergence greater than unity.
Applying this theorem to the state A0 in our case, with the probability of first
recurrence given by Pool-1), Pon •••• 1/n—(1+a) [Eq. (10)] immediately implies that
the radius of convergence of the sum (21) is unity. Hence, one concludes that the
convergence to equilibrium is slower than any exponential law.
326 Xiao-Jing Wang
Let us indicate why this might be expected from the viewpoint of the spectral
properties of the transition matrix (9). According to the Perron-Frobenius theory,17
a finite nonnegative matrix, say, (aii), i, j = 1,2, ... , m, possesses a unique maxi-
mum eigenvalue Ao such that
E E
rn
min < Ao < max aij . (22)
-
jcl j=i
For the transition matrix of a finite Markov chain, this sum is Ei p,3 s 1. Thus,
A0 = 1, and all the other eigenvalues have a modulus less than 1. Any initial distri-
bution will then be projected onto the (non-negative) eigenvector associated to Ao,
i.e., the invariant measure, and all the other components vanish in an exponential
fashion.
Now, for a countable Markov chain, the transition matrix has denumerably
infinite eigenvalues, so that the eigenvalue A0 = 1 may be approached arbitrarily
by other eigenvalues. This is indeed the case for the model of intermittency. Let us
sketch the argument. Truncating the transition matrix (9) up to n, one obtains a
finite matrix Wn of which the characteristic equation is
-1-
Fn(A) = r - Poor Poi r-2 - • • • - Po(n--2)A - Po(n-1) = 0 (23)
with lim„....en Fn(A0 = 1) = 0. It follows from the Perron-Frobenius theory that all
the roots in Eq. (23) are confined inside the unit disc.
Considering F,,(x) in Eq. (23) as the partial sum of an infinite series, one can
readily see that the radius of convergence of this series is 1. Now it is useful to
invoke a remarkable theorem in analysis, due to It. Jentzsch,18 which asserts that
for every power series, every point of the circle of convergence is a limit-point of
zeros of partial sums. Hence, the Ao = 1 of the countable chain is a limit-point of
roots in Eq. (23). On the other hand, it is reasonable to assume that as n oo,
every such root is arbitrarily near to one of the true eigenvalues of the infinite matrix
(9). One concludes therefore that A0 = 1 is not isolated. This suggests, although
it does not ensure, that the approach to equilibrium may not have an exponential
rate.
V. CONCLUDING REMARKS
There is an increasing number of systems to which the present work appears rele-
vant. Such a case cited in Gaspard and Wane is the Markov model defined on a
tree that was proposed by J. Meiss and E. Ott for Hamiltonian chaos, in the pres-
ence of a hierarchy of "cantofi." Another example is a model of abnormal diffusion
in resistively shunted Josephson junctions, which takes a form similar to Eq. (1).9
Perhaps more surprisingly, it has been noticed3 that a discrete one-dimensional
model of Anderson localization seems also somewhat akin to the intermittent map
Eq. (1). Let us end this paper with a few remarks on this intriguing finding.
The one-dimensional Anderson model is a 1-D discrete Shrodinger equation
with a random potential Mb
172.4-1 — 24'n + 4'n -1 = (E — Vn)4in . (24)
Letting Rn = 4'n/In-1, Eq. (24) can be rewritten as
Rn+1 = (2 — Vn — 1/Rn . (25)
All the states within the pure energy band [-4, 0] are localized in the presence
of a random potential {Vn}, with the inverse localization length directly given by
the Liapounov exponent of the map (25). For Vn FL"- 0, the map (25) is displayed in
Figure 2, with loglf(Rn)1= —2 log iRn j. Locally around x = 1, the mapping at the
band edge E = 0 looks the same as Figure 1, with z = 2, and it is of great interest
to recognize such a resemblance between Anderson localization and intermittency.
There are, however, notable differences which perhaps should not be overlooked.
Contrary to intermittent systems, here the mapping is invertible, and the Liapounov
exponent is always zero even for 0 < E, if the stochastic term is absents (obviously,
no localization is possible without random potential). Besides, due to the linear
character of Eq. (24), the Thouless formulan asserts a direct relationship between
the inverse localization length and the integrated density of states. Interpreted in
terms of the dynamics in Eq. (25), the former is the Liapounov exponent while the
latter, being related to the number of nodes of the wave function, is also the number
of times that Rn is negative in the lattice, hence assimilable to the "turbulent time"
(cf. Figure 2). Such a "dispersion relation" between the Liapounov exponent and
the turbulent time, however, does not seem to exist for the nonlinear intermittent
system: it would lead to the erroneous conclusion that the Liapounov exponent in
the latter case is also identically zero in the absence of noise.
On the other hand, the thermodynamic description of Eq. (1)21 does provide
a connection between the entropy S (here equivalent to the Liapounov exponent)
and the density of the laminar phase p (i.e., one minus the fraction of the turbulent
time). Both S and p are functions of the two thermodynamic variables f3 (inverse
of temperature) and ji (chemical potential), and are related one to the other by the
fundamental equation of thermodynamics
flp(P, = flPA - QA + s (26)

Hence, we have a thermodynamic, rather than dispersion, relation in the case of
such nonlinear intermittent processes.
328 Xiao-Jing Wang
f(x) = 1.8 - 1/x
FIGURE 2 The map Eq. (25) in the absence of the noise term. It arises from an one-
dimensional Anderson model.
ACKNOWLEDGMENT
It is a pleasure to thank warmly Professor G. Nicolis for his continuous help, en-
couragement, and fruitful correspondence. This work was partly supported by the
Department of Energy under contract number DE-AS05-81ER10947. Sincere thanks
are also due to the Center for Statistical Mechanics at University of Texas for fi-
nancial support of my attendance at the Santa Fe Institute Workshop.
REFERENCES
1. Ben-Mizrachi, A., I. Procaccia, N. Rosenberg and A. Schmidt. "Real and Ap-
parent Divergences in Low-Frequency Spectra of Nonlinear Dynamical Sys-
tems." Phys. Rev. A31 (1985):1830-1840.
2. Berge, P., Y. Pomeau and Ch. Vidal. L'ordre dans le Chaos. Paris: Herman,
1984.
3. Bouchaud, J. P., and P. Le Doussal. "Intermittency in Random Optical Lay-
ers at Total Reflection." .1. Phys. A: Math 19 (1986):797-810.
4. Chaitin, G. Algorithmic Information Theory. Cambridge: Cambridge Univer-
sity Press, 1987.
5. Derrida, B., and E. Gardner. "Lyapounov Exponent of the One-Dimensional
Anderson Model: Weak Disorder Expansion." J. Physique 45 (1984):1283-
1295.
6. Eckmann, J.-P., and D. Ruelle. "Ergodic Theory of Chaos and Strange At-
tractors." Rev. Mod. Phys. 57 (1985):617-656.
7. Fisher, M.E. "The Theory of Condensation and the Critical Point." Physics,
(published in Great Britain) 3 (1967):255-283.
8. Gaspard, P., and X.-J. Wang. "Sporadicity: Between Periodic and Chaotic
Behaviors." Proc. Natl. Acad. Sci. 85 (1988):4591-4595.
9. Geisel, T., J. Nierwetberg, and A. Zacherl. "Accelerated Diffusion in the
Josephson Junctions and Related Chaotic Systems." Phys. Rev. Lett. 54
(1985):616-619.
10. Kendall, D. G. "Geometric Ergodicity and the Theory of Queues." In Mathe-
matical Methods in the Social Sciences, edited by K. J. Arrow, S. Karlin, and
P. Suppes. Palo Alto: Stanford University Press, 1960,176-195.
11. Kolmogorov, A .N. "Combinatorial Foundations of Information Theory and
the Calculus of Probabilities." Russian Math. Surveys 34 (1983):29-40.
12. Mandelbrot, B. B. "Sporadic Random Functions and Conditional Spectral
Analysis: Self-Similar Examples and Limits." In Proceedings of the Fifth
Berkeley Symposium on Mathematical Statistics and Probability, edited by L.
LeCam and J. Neyman. Berkeley: University of California Press, 1967,155-
179.
13. Montroll, E. W., and B. J. West. "On an Enriched Collection of Stochatic
Processes." In Fluctuation Phenomena, edited by. E. W. Montroll and J. L.
Lebowitz. Revised edition. Amsterdam: North-Holland, 1987,61-206.
14. Nicolis, G., and C. Nicolis. "Master-Equation Approach to Deterministic
Chaos." Phys. Rev. A38 (1988):427-433.
15. Nicolis, G., C. Nicolis and E. Tirapegui. "Chaotic Dynamics, Markovian
Coarse-Graining and Information." Preprint, University of Brussels, 1989.
16. Paladin, G., and A. Vulpiani. "Anomalous Scaling Laws in Multifractal Ob-
jects." Phys. Rep. 156 (1987):147-225.
17. Seneta, E. Non-Negative Matrices. New York: John Willey Sr Sons, 1973.
330 Xiao-Jing Wang
18. Titchmarsh, E. C. The Theory of Functions. Oxford: Oxford University Press,

1932.
19. Thouless, D. J. "A Relation Between the Density of States and Range of
Localization for the One-Dimensional Random Systems." J. Phys. C: Solid
State Phys. 5 (1972):77-81.
20. Wang, X.-J. "Abnormal Fluctuations and Thermodynamic Phase Transitions
in Dynamical Systems." Phys. Rev. A39 (1989):3214-3217.
21. Wang, X.-J. "Statistical Physics of Temporal Intermittency." Phys. Rev. A40
(1989):6647-6661.
A. Zee
Institute for Theoretical Physics, University of California, Santa Barbara, California 93106
Information Processing in Visual Perception
We study various quantitative issues in the theory of visual perception

centering around two fundamental questions. How perceptive are we and
how are we as perceptive as we are? We outline a research program to
answer these questions.
The problem of understanding visual perception1'13'14 surely ranks as one of
the outstanding scientific problems of our time. Of all the problems that bear upon
the ultimate mystery of understanding how the brain works, the problem of visual
perception is perhaps the one most amenable to rigorous experimental study and
quantitative theory making. I would like to review briefly some work I have done
with W. Bialek.
The questions and issues we addressed can be divided into three areas. We
attempted to quantify some of the issues involved, with the hope of sharpening
these issues and of bringing some ideas into direct confrontation with experiments.
1. We humans in general, and physicists in particular, tend to think that our
brains function extremely well, especially at processing visual information. In
truth, however, a quantitative and even semi-quantitative measure of the ability
of the brain to process information sent to it by the sensory system is lacking.

332 A Zee
We simply do not know how well the brain tackles computational problems of
varying degrees of complexity.
How perceptive are we? Suppose that it could be established that the visual
system performs optimally. Then we can go on to ask what sort of computation
is necessary in order to achieve this level of performance, and to explore in detail
the types of design that may be required.
2. In the perception literature, a number of models of visual processing have been
proposed. We raise the question of how such models can be falsified by ex-
periment. Obviously, it is of prime importance to progress from qualitative
discussions to quantitative tests, in order to determine whether a given model
is in fact viable. We try to compute the performance that the visual system is
capable of according to each of these models. If this performance falls signifi-
cantly below the experimentally measured performance, then clearly the model
can be ruled out. A large class of perceptual models corresponds to the steepest
descent or mean field approximation in our formulation. An important issue is
whether this approximation is adequate in explaining human performance.
PERFORMANCE
OPTIMAL PERFORMANCE
To discuss how well the visual system performs, we must immediately raise an
obvious question: What are we to compare the performance of the visual system
with? The only natural standard, it appears to us, is the optimal performance
allowed by information theory, that is, the performance attainable if every bit of
information received by the visual system is used.
First, we must choose a "naturalistic" task well suited to the visual system but
which also allows a precise mathematical formulation amenable to rigorous analysis.
Since we suspect that the visual system can in fact perform at or near optimum,
we also want the task to be computationally difficult. We chose the discrimination
between patterns with noise and distortion added.111
More precisely, we propose an experiment in which the subject is first ac-
quainted with two patterns, described by 00(x) and .01 (x). Here x denotes the co-
ordinates of the two-dimensional visual field. A black and white pattern is described
by a scalar field (k(x) where 10(x) is equal to the contrast, that is, the logarithm of
the intensity of the pattern at the point x.
For each trial, the experimenter chooses either 00 or 01 , with equal probability,
say. Suppose (ko is chosen. Then the pattern is distorted and obscured with noise
NExperiments similar to (but simpler than) the ones proposed here have been done by Barrow 2'3
Also see Bialek and Zee.4t7
Information Processing in Visual Perception 333
so that the subject actually sees (k(x) = (ko(y(x)) + 0(x). Here x y(x) defines an
arbitrary one-to-one mapping of the plane onto itself. The noise 0(x) is taken for
simplicity to be Gaussian and white. Thus, the conditional probability of seeing (k(x)
were 00 chosen is given by P(0100) = (1/Z) f Dy e-w(I)-P f (ezfigx)- 0(v(z)N
4.. 2 The
functional W(y) should favor gentle distortions, for which y(x) x. More on W
later. Here Z is a normalization factor required by f P(0100 ) = 1. Henceforth,
we will often neglect to write the normalization factor. Evidently, a probability
P(0101 ) can also be defined by substitution. The subject is to decide whether the
pattern seen corresponds to 00 or (h.
The patterns 00 and 01 should be abstract so as to eliminate any possible
biological or cultural bias, such as those associated with our finely developed ability
to recognize human faces. We are interested in perception rather than cognition.
DISCRIMINABILITY
The information-theoretic optimal performance can then be computed according
to standard signal detection theory. A particularly relevant quantity is the discrim-
inability. Define the discriminant as A(0; 4,o, 44) = log[P(000)/P(0101)]. With this
definition, the discriminant is positive when the probability P(440) is larger than
P(0141) and negative when the opposite holds. (The logarithmic form for the dis-
criminant is chosen for convenience. Some other monotonic function of the ratio of
the two probabilities P(0100) and P(014)i) may serve equally well.) It can be shown
that optimal discrimination is accomplished by maximum likelihood. In plain En-
glish, the optimal strategy is to identify the pattern as oo if A(4); 4)o, 01) is positive,
and as eh if A(0; 4)o, (h) is negative. This is, of course, precisely the strategy that
any sensible person capable of knowing A will adopt. Having seen the image OW,
we have to decide whether it is more likely that the image "came" from 00(x) or
from ch(x). (Thus, the experiment implies a "learning phase" in which the subject
tries to "figure out" the relevant probability distributions. We are interested in the
performance reached after learning. This, of course, accounts for our insistence on
"naturalistic" tasks, for which the necessary learning has already been accomplished
through eons of evolution.)
The probability distribution of A if 00 is chosen is defined by P(A 14)o; 4)o vs. 01) =
f DO 50(0; (ko, 01) — A)P(0100). Similarly, P(Al(h;Oo vs. 40 can be defined. The
discrirninability, conventionally called (d')2, is defined as
(di )2 = (NO (A)1)2

00)4)2)0 + ((5 A)2)1]
where the subscript i = 0,1 indicates that the corresponding expectation value
should be taken in the distribution P(A100; eko vs. (h) and P(A101;0o vs. 01 ) re-
spectively. The meaning of (d')2 is obvious: it measures the overlap between the two
probability distributions when the two distributions are bell shaped. As the name
334 A. Zee
suggests, the discriminability (d')2 limits the extent to which one can discrimi-
nate between 00 and 01 . We are generally interested in the regime .(d')2 N 0 when
the visual discrimination task is highly "confusing." (When the distributions are
bell shaped, the discriminant can obviously be related to the percentage of correct
guesses. Incidentally, the discriminant (d')2, rather than some other more-or-less
equivalent quantity, is used because, being formed of "naturally occurring" expec-
tation values, it can be readily computed for certain simple problems and because
experimentalists in this field typically quote their observations in terms of (d')2.
The discriminant provides a convenient summary of the information contained in
the two P(A)'s.
FIELD THEORY AND STATISTICAL MECHANICS

As is well known, quantum field theory and statistical mechanics can both be de-
scribed by functional integrals. Thus, the considerable body of knowledge accumu-
lated about two-dimensional field theories and statistical mechanical systems may
be brought to bear on the theoretical problem of determining the various probability
distributions and (d')2.
In the last two decades or so, studies in quantum field theory and statistical
mechanics have revealed that apparently somewhat simple systems can exhibit ex-
ceedingly intricate collective behavior. In particular, phase transitions axe possible.
As the parameters (fl and parameters in W, in our case) appearing in a functional
integral vary, the behavior of the functional integral may also change discontinu-
ously or at least drastically. The computational complexity involved in evaluating
the functional integral may also change correspondingly. For instance, we may want
to calculate how correlation lengths change with fl.
Thus, in the actual experiment, it may be interesting to see to what extent
the actual performance tracks the optimal performance. It may happen that, for
a region of the parameter space, the actual performance would agree with the
optimal performance, but, as the experimenter varies the parameters, the actual
performance may abruptly deviate from the optimal performance, or it may drop
drastically even as it tracks the optimal performance.
LIKELY AND UNLIKELY DISTORTIONS

A potential point of confusion is that, while we disavow, in the context of this
discussion, any theorizing about how the visual system works, the functional integral
can be regarded as such a theory: we summon from memory storage either the
prototype pattern ibo or 01, apply distortion, add noise, and try to find particular
forms of the distortion and noise so as to match the seen pattern 0, all the while
weighing the likelihood of the particular distortion and noise. This "theory," while
simple, is not implausible. Subjectively, we feel that, when varying a pattern, we
gauge the likelihood of various distortions. In other words, we carry in our heads a
functional W. We can easily imagine designing a machine along these lines.
A quite different theory would suggest that we look for and identify "features"
such as edges between predominantly black areas and predominantly white areas
in the patterns 00 and 01.
Suppose experiments show the actual performance to be substantially below
optimal performance. What would that mean? It might mean that the visual sys-
tem is capable of only a crude approximation in evaluating the functional integral
involved. It would then be interesting to determine what approximation the visual
system uses. This is certainly possible in principle. Alternatively, it might mean that
the visual system can evaluate the relevant functional integral fairly accurately, but
that the W used by the experimenter does not correspond to the W that we "carry
in our heads."
In principle, the experimenter can carry out a series of experiments, each with a
different W, all corresponding to "reasonable" choices. Suppose the optimal perfor-
mance can be determined for each W. It could happen that the actual performance
does not come close to the optimal performance for any of these W's. Perhaps
more interestingly, it could also happen that the actual performance reaches or
comes close to the optimal performance for some W's.
The correspondence with statistical mechanics also suggests the question of
whether some sort of universality might play an essential role in visual perception.
We can also ask whether the corresponding statistical-mechanical system exhibits
short-ranged or long-ranged correlation. In this connection, we may perhaps em-
phasize that two logically distinct issues surface in our program. First, we have the
question of whether the visual system can attain the optimal performance theoret-
ically attainable. Next, given that this optimal performance is in fact attained, we
can ask what are the computations necessary to attain this performance.
We would like to conjecture that actual performance does in fact come close to
optimal performance for some reasonable W. If experiments verify our conjecture,
then we are confronted by the interesting issue of the type of circuitry and algorithm
capable of effectively evaluating the functional integral involved.
LOCAL VERSUS NON-LOCAL COMPUTATIONS

In our work, we construct tasks in which arbitrarily long-ranged and multi-point
correlations must be computed if optimal performance is to be reached, at least
in certain limits which are controllable as the different image ensembles are gener-
ated. It is known that, in discrimination among simpler image ensembles, human
observers can approach optimal performance in the sense defined here.2,3 This sug-
gests experiments in which the performance of humans is measured as a function of
the parameters which control the relevant correlation lengths. If the visual system
can only compute local functionals, as with feature detectors, performance should
follow the optimum only for a restricted range of correlation lengths and then
fall away dramatically. If, on the other hand, the system can adapt to compute
336 A. Zee
strongly non-local functionals of image intensity, no such abrupt drop will be ob-
served. These experiments will be difficult, but they have the potential of providing
serious challenges to our understanding of computation in the nervous system.
Our suspicion is that the system can solve non-local problems, and that there
are interesting theoretical questions to be answered about the algorithms and hard-
ware responsible for such contributions. Suspicions aside, the approach described
here4,7 provides the tools for asking very definite questions about the computational
abilities of the brain.
Unfortunately, it is well-nigh impossible to evaluate functional integrals exactly.
After all, the exact evaluation of a functional integral amounts to the exact solution
of a statistical-mechanical system or of a quantum field theory. The history of
statistical mechanics and quantum field theory testifies amply to the difficulty of
the task. Thus, in our work we are reduced to trying various approximations to
the functional integrals, often reaching only qualitative conclusions. (Of course, the
functional integrals can also be evaluated numerically.)
Instead of trying to evaluate P(440), we have also tried to extract some general
features. In particular, we have considered using renormalization group to study the
properties of P(440 ) in an attempt to discover a strategy for "universal computa-
tion" in processing visual information.8
It is the interplay between noise and distortion that makes the evaluation of the
field theory defined by P(0100) so difficult. If either noise or distortion is omitted,
the task of evaluating P(014) becomes considerably simpler. (In particular, with
no distortion, the problem becomes Gaussian and trivial.) Why do we make life
miserable for ourselves? Because we want to appreciate the difficulty of a task that
the visual system performs extremely well (at least according to the subjective
evidence). Indeed, our work represents largely a record of our awakening to how
difficult the computations involved are. The difficulty of this task is also reflected in
the fact that machines with artificial vision have not mastered this task of "invariant
perception." Indeed, as far as we know, current machines have difficulty recognizing
images if the image can be arbitrarily rigidly translated, rotated, and dilated. Of
course, it may also turn out that our visual system does not perform as well as we
think it does.
MODELS
FEATURE DETECTORS
In the second part of our work, we attempt to capture, in quantitative models,
the essence of some leading theories of perception. We then compute the predicted
performance at a "naturalistic" perceptual task, with the aim of ultimately compar-
ing whatever results we may obtain with actual experiments on the human visual
system.
For example, consider the feature-detector theory which originated in the neuro-
physiological experiments of the 1950's.f21 Neurons in the visual system are assumed
to compute nonlinear functionals of the image intensity and thus signal the presence
of features in the image. Thus, the continuous pattern 0(x) is converted into a set
of discrete "feature tokens" to be processed by subsequent layers of neurons. We
attempt to capture the essence of this theory by taking the simplest possibility for
the feature tokens: they are Ising spins at, located at xt,,it. = 1, 2, ... , N, with cr0
taking on values ±1. The image is sampled at x to give 4 = f d2xf(x — x0 )0(x)
where f (x x0 ) represents the response function of a feature-detector neuron lo-
cated at xp. The response function f(x), with its excitatory center and inhibitory
surround, is well known to neurophysiologists.13 It is often modeled as the Laplacian
of a Gaussian V2G or as the difference of two Gaussians.M
Our models is that o tends to be +1 when 00 is positive and —1 when 00 is
negative, as described by some probability distribution P(I710). Putting it together,
we have the conditional probability P(crleko) = f D¢ P(crIO)P(0100 ). In other words,
the experimenters (or the natural environment we live in) turns the known image
00 into 0. The seen image 0 is then processed into the "feature tokens" o0.
We believe that this "Ising" model is prototypical of a large family of models
which replace the continuous image 0(x) by discrete and local feature tokens. It
contains one of the classic feature-detector ideas concerning the extraction of edges,
a concept formalized by Marr, Poggio, and others as the location of contours where
some appropriately filtered version of the image vanishes. Here the "domain walls"
between spin-up and -down regions mark the zero-crossing contours, so in fact this
spin representation has a bit more information than a "sketch" based on zero-
crossing contours alone.
The point is that these models are sufficiently well defined so that they can be
studied numerically or analytically in certain limits. For instance, if the range of the
response function f (x) is small compared to the intercellular spacing (which is not
biologically reasonable), the maximum efficiency can be seen to be 2/hr = 0.64 which
appears low compared to experimental reports of efficiency ranging from 0.5 to 0.95.
We conclude that overlaps of the receptive fields of neighboring cells are essential
for understanding the observed efficiency of visual perception. Furthermore, these
overlaps must be negative to enhance (d')2, which necessitates an excitatory center,
inhibitory-surround type of organization found for real neurons. Obviously, we can
go on to consider variations of this model. For instance, the work of Hiibel and
Wiesel established that certain cells are selectively sensitive to directions.13 Thus,
instead of Ising spins, we can consider "Heisenberg" spins r0 .
(21For a brief review of the history of feature detectors, see Barrow 2'3
DIFor example, see Kuffler22 and Parker and Hawkenls; also Albrecht,' p.117 &
338 A. Zee
LINEAR FILTERS
Another class of models that we have considered supposes that the detectors in
the visual system act as linear filters.9 In other words, each detector functions as
a narrow-band Fourier analyzer centered at some characteristic spatial frequency.
Models of this type are suggested by the work of Campbell, Robson, Lawden, and
DeValois.N
PERCEPTION BY STEEPEST DESCENT

In recent years, neural networks and related systems have been studied intensively
as models of the brain. The essence of the approach consists of casting various
cognitive, learning, and perceptual tasks as optimization problems. Neural networks
have been shown to be able to solve optimization problems efficiently and in parallel.
Thus, it is an attractive theory that the functional integrals relevant to visual
perception are evaluated in the steepest descent approximation. We feel that this
is an important and urgent issue that should be settled by experiment.6 Does the
brain merely do steepest descent? Or is it considerably more sophisticated?
To answer this question, we need to have a version of the problem outlined in I,
but simplified to such a degree that we can solve it both exactly and in the steepest
descent approximation. We settle on a simple experiment described by
-09 7
1 dp e f 4,12-[0(z)- 0.(za
P(0100) = —
Z'
where p parametrizes a family of distortions. For example, p can be an angle 9 and
xi, is equal to x rotated through O. We also take th = 0 for simplicity. Thus, the
subject is to decide on the presence or absence of the prototype pattern 150 with
the pattern obscured by noise and presented with a randomly chosen orientation on
each trial. We are interested in the regime in which the noise becomes overwhelming.
(By simple scaling, we see that quantities such as (d')2 depend only on 1317, so the
13 can be absorbed.) If f d2x Og(xp ) = f d2x (g(x) as is the case for the examples we
have considered, the relevant functional integral can be organized in the suggestive
form
1
P(014) = Z — dP e —PH0(0)-7111(4),p)
with the bare Hamiltonian Ho = f d2x02 and the "perturbing" Hamiltonian H1 =

—13 f d2x 0(x)00(xp). Our task is to evaluate the integral over p exactly and
in the steepest descent approximation and, hence, to obtain the efficiency e =
(di) teepest descent/(d')? Of course, in the large y limit, c tends to 1 as a mathe-
matical statement of the efficacy of the steepest descent approximation. We are
interested in the opposite noisy limit. Thus, we evaluate the small 7 expansion
exactly and by steepest descent.
[4]See DeValois,11 F. W. Campbell and M. Lawden,1° and references therein.

We find to lowest order in 7 that
[(Hi - (Hno(T/i)or
(cigteepest descent [(Hl )o — (ling]
Here Hi = f dp H1(0, p) (with the integral over p normalized so that f dp =

f [d0/220 if p is an angle) and H;Ei H1(0,72*(0)) is the minimum of Hi(ib,p) as a
function of p. Notice that it has a highly non-trivial dependence on 0. As indicated,
the various expectation values are to be taken with the bare Hamiltonian (• • )0 =
[1/Z0] f Do e-04.(0)(• • •). In comparison, we have (d')2 = 72[(H1) - (H1)2]. The
obvious condition that e < 1 is satisfied by Schwarz's inequality.
The evaluation of expectation values such as (H1Hr)0 is a rather non-trivial
exercise in functional integration. Again, we have to use generalizations of Rice's
method determining the distribution of minima. Here we merely summarize the
result, which turns out to depend on the correlation function
11(p,p') = (Hi (0, p)14 (0, TO) = 213 d2x 00(zi,)0.0 (xp, )
(In the examples we considered, A(p,p') = A(p - p') is "translation invariant.")

Let Do = A(0), A2 = [1:120(0)/dP2], and 04 = [d4A(0)/dp4]. Then
\2
(digteepest descent = 72 (f dPA(P)) (A0414 (1— PAD •
To see what is actually going on, we can now try out various specific prototype
pictures 00(x). For example, we have considered a wedge- or leaf-shaped picture
00 (z) = f(r)e-11/2c(r)le2 where r and 0 are the polar coordinates of x. The impor-
tant quantity here is the width of the wedge Ci(r) (which we take to be small).
We find
c-
it 1+13 (1 T)T-72
2 C- )
where (• • ) denotes some average of (• • ) over the radial direction weighted by
,f 2(r) and geometrical factors. If C(r) =constant (so that the picture is wedge
shaped), we have e = Pi/713/4(1 - [48])]. If the picture is very jagged so that
(C-2) >> (C-1)2, then e = [(C1)12-1].
How can we use this analysis to find out if the visual system is actually an
efficient device that locates minima or "best matches" (as simple neural network
models would suggest)? Suppose the experiment outlined is done and the measured
efficiency comes out to be equal to the value predicted by steepest descent. That
would offer dramatic support for the idea of "best match." On the other hand, if
the efficiency is measured to be greater than the predicted efficiency, that would
rule out or at least cast grave doubt on the "best match" theory. Unfortunately, the
situation is complicated by the possibility that information is lost by processing,
340 A. Zee
for instance, by feature detectors. Thus, one would have to consider the steepest
descent approximation in evaluating the integral over p not in P(0100), but in
P(cr1150) = f D¢ P(a10)P(0100 ) (with u denoting some feature "tokens"). For the
calculation outlined here to be relevant, we have to suppose that processing affects
both (d'l,steepest descent and (d')2 in the same proportion so that the effect cancels
out in c. Note, however, that the experiment can be repeated and the theoreti-
cal expression for c can be evaluated (numerically at least) for a wide variety of
prototype pictures 00 .
SUMMARY
In summary, we have outlined a systematic programMto address some quantitative
issues that we must resolve in order to understand visual perception. An important
point is that images used in vision experiments should be generated from statistical
ensembles that may be formulated analytically. Advances in high-speed compu-
tation should make possible this type of controlled experiments and at the same
time facilitate the analysis of models of how the human perceptual system tries to
determine these statistical ensembles.
SOME QUESTIONS
In conclusion, I would like to pose the following list of questions as a challenge to
vision researchers.
1. What is the optimal performance allowable for various perceptual tasks?
2. What are the computations needed to reach this optimal performance? Can we
identify the issues involved (for instance, local versus non-local computation)?
3. Is the visual system actually capable of this optimal performance? How close
does it come? (These questions can be answered only be experiments, of course.)
4. the performance of the visual system approximates optimal performance, how
If
does it perform the computations identified in question 2 above? What neural
circuitry and algorithm can carry out these computations? Can various simple
models be ruled out?
5. Does the visual system operate by optimization? Is the performance reached
by the steepest descent approximation in accordance with observation?
6. Are there universal features and properties in the sense of statistical physics?
As is evident by the preceding discussion, we have touched only on the begin-
nings of this program and have reached only qualitative conclusions at best. Many
challenging problems remain.
[51A. somewhat fuller account of the discussion presented here may be found in Zee.16
ACKNOWLEDGMENTS
I am indebted to W. Bialek for numerous stimulating and interesting discussions.
This research was supported in part by the National Science Foundation under
Grant No. PHY82-17853, supplemented by funds from the National Aeronautics
and Space Administration, at the University of California at Santa Barbara.
REFERENCES
1. Albrecht, D. G., ed. Recognition of Pattern and Form. New York: Springer-
Verlag, 1982.
2. Barrow, H. B. "The Past, Present, and Future of Feature Detectors." In
Albrecht,' 4.
3. Barrow, H. B. "The Absolute Efficiency of Perceptual Decision." Phil. Trans.
Roc. Soc. London B290 (1980):71.
4. Bialek, W., and A. Zee. Phys. Rev. Lett., 58 (1987):741.
5. Bialek, W., and A. Zee. "Understanding the Efficiency of Human Percep-
tion." Phys. Rev. Lett., 61 (1988):1512.
6. Bialek, W., and A. Zee. "Inadequacy of Mean Field Approximation in Visual
Perception." In preparation.
7. Bialek, W., and A. Zee. "Invariant Perception: A Functional Integral and
Field Theoretic Approach." In preparation.
8. Bialek, W., and A. Zee. "Recognizing Ensembles of Images: Universality at
Low Resolution." In preparation.
9. Bialek, W., and A. Zee. "Linear Filter Models in Visual Perception." In
preparation.
10. Campbell, F. W., and M. Lawden. "The Physics of Visual Perception." In
Albrecht,' 146.
11. DeValois, R. L. "Early Visual Processing: Feature Detection or Spatial Filter-
ing." In Albrecht."
12. Kuffier, S. W. "Discharge Patterns and Functional Organization of Mam-
malian Retina." J. Neurophysio. 16 (1953):57.
13. Levine, M. W., and J. M. Shefner. Fundamentals of Sensation and Percep-
tion. New York: Random House, 1981.
14. Marr, D. Vision. New York: W.H. Freeman & Co., 1982.
15. Parker, A., and M. Hawken. "Capabilities of Monkey Cortical Cells in Spatial
Resolution Tasks." J. Opt. Soc. Am. 2 (1985):1101.
16. Zee, A. "Some Quantitative Issues in the Theory of Perception." In Evolu-
tion, Learning, and Cognition, edited by Y.C. Lee. Singapore: World Scien-
tific, 1989.
V Probability, Entropy, and
Quantum
Asher Peres
Department of Physics, Technion—Israel Institute of Technology, 32 000 Haifa, Israel
Thermodynamic Constraints on Quantum

Axioms
The second law of thermodynamics imposes severe constraints on the for-

mal structure of quantum theory. In particular, that law would be violated
if it were possible to distinguish non-orthogonal states or if SchrOdinger's
equation were nonlinear.
INTRODUCTION
Thermodynamics, relativity and quantum theory are the three pillars upon which
the entire structure of theoretical physics is built. They are not branches of physics
(like acoustics, optics, etc.) but general frameworks encompassing every aspect of
physics. Thermodynamics—for which a more appropriate name would have been
"thermostatics"—governs the convertibility of various forms of energy; relativity
theory deals with measurements of space and time; and quantum theory is a set of
rules for computing probabilities of outcomes of tests (also called "measurements")
following specified preparations.14
Complexity, Entropy, and the Physics of Information, SA Studies in the

346 Asher Peres
Each member of this triad involves time-ordering as a primitive concept. In

thermodynamics, high-grade ordered energy can spontaneously degrade into a dis-
ordered form of energy, called heat. The time-reversed process never occurs. More
technically, the total entropy of a closed physical system cannot decrease. In relativ-
ity, information is collected from the past light cone and propagates into the future
light cone. And in quantum theory, probabilities can be computed for the outcomes
of tests which follow specified preparations, not those which precede them.
Specific physical situations may involve two of these fundamental frameworks,
or even all three simultaneously. High-energy astrophysics is an obvious example.
Another relatively simple problem is the equilibrium of electromagnetic radiation
with a relativistic gas in an isothermal enclosure. These situations raise the question
of the mutual consistency of seemingly unrelated requirements imposed by thermo-
dynamics, relativity, and quantum theory. For example, a detailed investigation of
the radiating relativistic gas, along the lines of Einstein's classic work,5 shows that
thermal equilibrium can be obtained if, and only if, the spontaneous decay rate
of an excited atom (Einstein's A-coefficient) is reduced in the exact ratio of the
relativistic time dilation due to the motion of that atom."
The purpose of this article is to show that tampering with some of the axioms
of quantum theory may lead to a violation of the second law of thermodynamics.
For example, this would happen if there were a method for distinguishing non-
orthogonal states or if Schriidinger's equation were nonlinear. The reason for this
violation is that entropy, as defined in quantum theory by von Neumann15 and
in information theory by Shannon," is fully equivalent to the mundane entropy
of mechanical engineers. In particular, if the von Neumann-Shannon entropy of a
closed physical system can decrease (as a result of a modification of the axioms
of quantum theory), it becomes possible to build conceptual engines extracting an
unlimited amount of work from an isothermal reservoir.
In the next section, I shall prove (that is, I shall argue as convincingly as I
can) that the von Neumann-Shannon entropy is authentic entropy, with all the
rights and privileges pertaining thereto. That section is the difficult part of this
article. The reader who is already convinced that the claim is true may skip the
long "proof." Various corollaries are derived in the following section.
ENTROPY
The purpose of this section is to prove that von Neumann's definition of entropy is
equivalent to that of standard thermodynamics. I hope that this proof will be more
readable, and also more convincing, than the one found in von Neumann's classic
book." I shall use for this proof some recent results due to Partovi.9
Thermodynamic Constraints on Quantum Axioms 347
The entropy of a mixture of dilute, inert, ideal gases is a standard problem of

classical thermodynamics 21 Its value is
S = —N E ci log , (1)
where N is the total number of molecules, and c3 is the concentration of the jth
species. (Units are chosen so that Boltzmann's constant equals 1. Temperature is
therefore measured in energy units and entropy is dimensionless.) The derivation
of Eq. (1) relies on the possibility of making semipermeable membranes which are
transparent to type j molecules and opaque to all others. These membranes are used
as pistons in an ideal frictionless engine, immersed in an isothermal bath at temper-
ature T, as sketched in Figure 1. It is easily shown21 that a reversible separation of
the mixed gases must supply an amount of isothermal work —NT E c3 log ci. This
work is converted into heat and released into the reservoir. Therefore the mixing
entropy is given by Eq. (1).
von Neumann's definition of entropy of a quantum state closely parallels the
above argument. It assumes that there are semipermeable membranes capable of
separating orthogonal states with 100% efficiency—this is indeed the operational
meaning of "orthogonal states." The fundamental problem is whether it is legitimate
to treat quantum states in the same way as classical ideal gases, and in particular
why one should expect thermal equilibrium to be achieved.
In his proof, von Neumann15 relies on a subterfuge proposed by Einstein4 in
1914, in the early days of the "old" quantum theory. Consider many similarly pre-
pared quantum systems, such as Bohr's planetary atoms. Each one is enclosed in a
large box with impenetrable walls, so as to prevent any interaction between these
quantum systems. All these boxes are then placed into an even larger container,
where they behave as an ideal gas, because each box is so massive that classical
mechanics is valid for its motion (i.e., there is no need of Bohr-Sommerfeld quan-
tization rules—remember that we are in 1914). The container itself has ideal walls
which may be, according to our needs, perfectly conducting, perfectly insulating,
or with properties equivalent to those of semipermeable membranes. These "mem-
branes" are endowed with automatic devices able to peek inside the boxes and to
test the state of the quantum systems enclosed therein. In his book,15 von Neumann
insists (p. 359) that the practical infeasibility of this contraption does not impair
its demonstrative power: "In the sense of phenomenological thermodynamics, each
conceivable process constitutes valid evidence, provided that it does not conflict
with the two fundamental laws of thermodynamics." He then shows that Eq. (1)
can be recast in the form
S = —N Tr (p log p), (2)
where p is the density matrix representing the state of a molecule of our gas. The
ci of Eq. (1) correspond to the eigenvalues of p.
The problem remains whether this hybrid classical-quantal reasoning is con-
sistent. My purpose here is to give a genuinely quantal proof of the equivalence of
348 Asher Peres
INITIAL STATE
::::::::
::::::.:1::
THERMAL RESERVOIR FINAL STATE
FIGURE 1 Ideal engine used to separate gases A (to the left) and B (to the right). The
vertically and horizontally hatched semipermeable pistons are transparent to gases A
and B, respectively. The mechanical work that must be supplied in order to transform
the initial state into the final state is released as heat into the thermal bath.
von Neumann's entropy, Eq. (2), with the ordinary entropy of classical thermody-
namics.
A SUMMARY OF QUANTUM THEORY

As stated above, the essence of quantum theory is that it allows one to compute
probabilities for the outcomes of tests, following specified preparations." It therefore
is a statistical theory: When we want to make predictions about a specific quantum
object, such as an atom, we must consider it as a member of a Gibbs ensemble,
namely an infinite set of conceptual replicas of that object, all identically prepared.
Only then can we give a meaning to the notion of probability. The fundamental
assumption of quantum theory is that all information about the preparation of
such an ensemble can be represented by a Hermitian matrix p, satisfying Tr p = 1,
called the density matrix.
Quantum theory further assumes that every observable property A of a physi-
cal system is also represented by a Hermitian matrix. I shall denote this matrix by
the same letter A, for simplicity. If property A is actually measured on a quantum
system, then, for any preparation of that system, the result of the measurement
turns out to be one of the eigenvalues of the matrix A. Moreover, if we have numer-
ous quantum systems, each one resulting from the same preparation represented
by the matrix p, and if A is measured on each one of them individually, the aver-
age value of the results tends to (A) = Tr (Ap). The latter formula is derived by
von Neumann from fairly weak assumptions (see von Neumann,15 p. 316).
These rules have a remarkable consequence. Given two different preparations
represented by matrices pl and p2, one can prescribe another preparation p by the
following recipe: Let a random process have probability A to "succeed" and proba-
bility (1 — A) to "fail." In case of success, prepare the quantum system according
to pl. In case of failure, prepare it according to p2 . This process results in a p given
by
p = A pi + (1— A) p2 . (3)
Indeed, if the above instructions are executed a large number of times, the average
value obtained for subsequent measurements of A is
(A) = A Tr (Api ) + (1— A) Tr (Ap2) = Tr (Ap)• (4)
What I find truly amazing in this result is that, once p is given, it contains
all the available information and it is impossible to reconstruct from it pi, and p2!
For example, if we prepare a large number of polarized photons and if we toss a
coin to decide, with equal probabilities, whether the next photon to be prepared
will have vertical or horizontal linear polarization, or, in a different experimental
setup, we likewise randomly decide whether each photon will have right-handed or
left-handed circular polarization, we get in both cases the same
1 (1 0 \
P= i 0 1
An observer receiving megajoules of these photons will never be able to discover
which one of these two methods was chosen for their preparation, notwithstanding
the fact that these preparations are macroscopically different. (If this were not
true, EPR correlations would allow instantaneous transfer of information to distant
observers, in violation of relativistic causality.6)
Another example would be to prepare photons having, with equal probabili-
ties, linear vertical polarization or circular right-handed polarization. An observer
requested to guess what was the preparation of a particular photon, under the best
conditions allowed by quantum theory, would be able to give the answer with cer-
tainty in only 29.3% of cases.7,11 It will be shown below that a "superobserver"
who could always give an unambiguous answer would also be able to extract an
infinite amount of work from an isothermal reservoir.
THE ENTROPY OF A QUANTUM ENSEMBLE

Our first task is to derive von Neumann's formula (1) without invoking Einstein's
classical impenetrable boxes.4 In order to do so, we shall replace these fictitious
boxes by quantum degrees of freedom—which may be equally fictitious, but at
least cannot lead to any of the inconsistencies that plague hybrid quantal-classical
systems.
Let q denote collectively all the (real) degrees of freedom and let R be three
additional, fictitious degrees of freedom of our quantum system (they can be inter-
preted as the center of mass coordinates of the Einstein's box enclosing it). The
350 Asher Peres
wave function 0(q) therefore becomes 0(q, R)—there is no danger of inconsistency

here, since R belongs to another, independent quantum system. We.then write for
the Hamiltonian p2
H = Ho + (5)
2M ,
where Ho is the original Hamiltonian of the quantum system, P is the momentum
conjugate to R, and M is an arbitrarily large number. At this stage, there is no
interaction between the original degrees of freedom and the new fictitious ones.
If we want to use density matrices rather than wave functions, we write these as
p(ere, err).
Next, we introduce Einstein's large container, which also serves as a thermal
reservoir. It may have moving parts, such as pistons, to suit our needs. It is conve-
nient to divide its degrees of freedom into two classes: a small number of macrovari-
ables (center of mass position, spatial orientation, location of the pistons, etc.) col-
lectively denoted by X, and a huge number of microvariables, denoted by x, which
describe the atomic structure of the container. The macrovariables have a very slow
motion and therefore can be treated classically in the Born-Oppenheimer approx-
imation. The microvariables are in thermal equilibrium. Their density matrix is a
Gibbs state at temperature fl-1:
Pl = Z
(6)
where H1 is the Hamiltonian of the container (in the Born-Oppenheimer approxi-

mation) and Z =1Y [exp(-#1/1)]. We further assume that the energy uncertainty
of the quantum system enclosed in the container far exceeds the average level spac-
ing of the container energy spectrum, so that the quantum system does not feel
this spectrum discreteness.'
Recall that the extra degrees of freedom R of the quantum system exist only in
our imagination, just as the thermal reservoir with its moving pistons. Nevertheless,
as long as their existence does not violate known laws of physics, their introduction
is a perfectly legitimate method to discover additional laws. (Their role is analogous
to that of the fictitious "observers" who send and receive signals along the light
cones of relativity theory.)
Finally, we introduce an interaction between our quantum system and the con-
tainer in which it is enclosed. The result of that interaction can be considered as a
scattering of the quantum system (with dynamical variables q, R. and P) due to its
collisions with the container. Recall that the latter is described by microvariables
x (quantized) and macrovariables X (considered now as classical parameters).
Two cases must be distinguished, depending on whether the container includes
semipermeable partitions or only walls which are indifferent to the internal state
of the quantum system. In the latter case, the interaction Hamiltonian involves x,
X, and R. (and possibly P) but not not the original variables q of the quantum
system. The internal evolution of the latter thus is completely disjoint from that of
the fictitious variables x, X, R and P. On the other hand, a semipermeable piston
selecting quantum systems according to their internal state must be described by

an interaction term which also involves q.
We can now apply a theorem recently derived by Partovi.9 If, before the scat-
tering event, the container is in the Gibbs state given by Eq. (6), the property of
the scattered quantum system represented by the expression
S — fl(E) = —Tr (p log P) — 13 Tr (I) HO, (7)
cannot decrease as a result of the collision. This follows from conservation of energy
and convexity of entropy.16 Note that in Eq. (7), p and Ho refer to the quantum
system, but J3 refers to the thermal reservoir with which it collided.
We further assume that the thermal reservoir is so large that its state after the
collision can again be described by Eq. (6), with the same j3, and in particular that
it is justified to ignore its correlation to the state of the quantum system. We there-
fore expect that, after numerous collisions, S— (E) will reach the maximum value
allowed by selection rules. If all the states of the quantum system are accessible
(i.e., if there are no selection rules), the maximum value of S —13(E) corresponds
to a Gibbs state of the quantum system, at the same temperature 0-1 as the reser-
voir. Not every state, however, may be accessible. In particular, if the container
has passive walls interacting only with the R degrees of freedom but not with the
q variables, the internal state of the quantum system cannot be affected. In that
case, an ensemble of quantum systems described by Eq. (5) has the same statistical
properties as a classical ideal gas of free particles of mass M. In particular, it exerts
exactly the same pressure on the walls of the container. This is an immediate con-
sequence of the evolution equation of the Wigner distribution function" which, for
free particles, is identical to the Liouville equation in classical statistical mechanics.
Up to this point, nothing was been proved that has any consequence in the
real world. Only the fictitious degrees of freedom R were thermalized by multiple
collisions with the fictitious container. The situation becomes more interesting if
semipermeable partitions are introduced. As explained above, these partitions are
described by an interaction term involving q, R, and X (note that the classical
parameters X are prescribed functions of time and that their time dependence
must be extremely slow on time scales relevant to the quantum system; otherwise
the Born-Oppenheimer approximation would not be valid).
To describe a semipermeable partition, we have to add to the right-hand side
of Eq. (5) a term which, in the simplest case, has the form V(q, R, X). This term, if
suitably chosen, causes the formation of correlations between the variables q and R.
For example, we can concentrate particles with spin up in one part of the container
and those with spin down in the other part, by introducing an interaction
Hint = —crz R.n F(R, X), (8)

where n is a constant unit vector and F is a non-negative function of its arguments
(in particular, F may be large only when R approaches some combination of X
352 Asher Peres
representing the presence of a semipermeable partition). The interaction (8) corre-

sponds to a force nFaz acting on the quantum system. Therefore R will accelerate
in the direction ±n, the ± sign being that of the eigenvalue of az corresponding to
the state of the quantum system.
This interaction is readily generalized to one that sorts different eigenvalues
(al , a2 , ) of any operator A. Writing A instead of as in Eq. (8), we find that the
quantum systems are subject to forces nFai , nFa2, etc., according to their state.
The lesson learned from this simple theoretical model is that nothing prevents us, at
least in principle, from confining in different regions of R. space quantum systems
in orthogonal states. They behave exactly as if they were a mixture of classical
ideal gases. Therefore, there should be no doubt that the von Neumann entropy is
equivalent to the entropy of classical thermodynamics (in the same sense that the
quantum notions of energy, momentum, angular momentum, etc., are equivalent to
the classical notions bearing the same names).
HOW TO BEAT THE SECOND LAW OF THERMODYNAMICS

SELECTING NON-ORTHOGONAL STATES
It is noteworthy that the interaction (8) and its generalizations allow us to distin-
guish different eigenvalues of Hermitian operators, which correspond to orthogonal
states of the quantum system. Let us suppose now that some other type of in-
teraction would allow us to distinguish non-orthogonal states. This would have
momentous consequences; for example, a certain type of EPR correlations could
be used to transfer information instantaneously.2 I shall now show that this could
also be used to convert into work an unlimited amount of heat extracted from an
isothermal reservoir.
The process is illustrated in Figure 2 for the case of two non-orthogonal photon
states. Suppose that there are n photons. One half of them are prepared with verti-
cal linear polarization, and the other half with a linear polarization at 45° from the
vertical. Initially (a) they occupy two chambers with equal volumes. The first step
of the cyclic process is an isothermal expansion, doubling these volumes (b). This
expansion supplies an amount of work nT log 2, where T is the temperature of the
reservoir (recall that Boltzmann's constant k = 1). At that stage, the impenetrable
partitions separating the two photon gases are replaced by semipermeable mem-
branes, similar to those of Figure 1. These membranes, however, have the unusual
ability of selecting non-orthogonal states: One of them is transparent to vertically
polarized photons and reflects those with polarization at 45° from the vertical; the
other membrane has the opposite properties. A double frictionless piston, like the
one in Figure 1, thus brings the engine to state (c) without expenditure of work
or heat transfer. We thereby obtain a mixture of the two polarization states. Its
density matrix is
= [ (01) (01)1 + 1 () ()1 0.75 0.25 )

0.25 0.25 ) • (9)
V) V)
The eigenvalues of p are 0.854 (corresponding to photons polarized at 22.5° from the
vertical) and 0.146 (for the opposite polarization). We now replace the "unusual"
membranes by ordinary ones, selecting these two orthogonal polarization states. The
next step is an isothermal compression, leading to state (d) where both chambers
have the same pressure and the same total volume as those in state (a). This
isothermal compression requires an expenditure of work
—nT (0.146 log 0.146 + 0.854 log 0.854) = 0.416 nT, (10)
which is released as heat into the reservoir. This is less than the amount nT log 2—
which was gained in the isothermal expansion from (a) to (b)—by the amount
0.277 nT. Finally, no work is involved in returning from (d) to (a) by suitable
rotations of polarization vectors (see von Neumann,15 p. 366). We have thereby
demonstrated the existence of a closed cycle whereby heat is extracted from an
isothermal reservoir and converted into work, in violation of the second law of
thermodynamics.
NONLINEAR SCHRODINGER EQUATION

A similar violation arises if nonlinear "corrections" are introduced in Schriklinger's
equation, as proposed by many authors, with various motivations3'12'17 A nonlin-
ear Schrodinger equation does not violate the superposition principle in its weak
form. The latter merely asserts that the pure states of a physical system can be
represented by rays in a complex linear space. This principle does not demand that
the time evolution of the rays obey a linear equation.
It is not difficult to invent nonlinear variants of Schrodinger's equation such
that if 0(0) evolves into 0(t) and 0(0) evolves into OW, the pure state represented
by 0(0) + 0(0) does not evolve into 40+ 0(t), but into some other pure state.
I shall now show that these nonlinearities lead to a violation of the second law
of thermodynamics if the other postulates of quantum mechanics are kept intact.
In particular, I retain the fundamental assumption that "two states 0 and 0 can
be divided by a semipermeable wall if they are orthogonal" (see von Neumann,15
p. 370).
Consider a mixture of quantum systems represented by a density matrix
p=AP0+(1—A)P,0 , (11)
354 Asher Peres
V
V
'?
V V V
V
(a) (b) (c) (c) (d)
FIGURE 2 Cyclic process extracting heat from isothermal reservoir and converting it
into work, by using a semipermeable partition which selects non-orthogonal photon
states. Double arrows represent the linear polarizations of photon ensembles. The
symbol V is for vacuum.
where 0 < A < 1 and where Po and Po are projection operators on the pure states
0 and 0, respectively. The nonvanishing eigenvalues of p are
tvJ• = 12 ±[ 1.4 — A (1 — A) (1 — x)]1, (12)
where x = 101012 . The entropy of this mixture, S = —k E wi log tvi , satisfies
dSldx < 0 for any A. Therefore, if the pure quantum states evolve as 0(0) —0 0(t)
and 0(0) 7b(t), the entropy of the mixture p shall not decrease (i.e., that mixture
shall not become less homogeneous) provided that
10(t)10(t))12 s 1(0(0)10(0))12 • (13)
In particular, if (0(0)10(0)) = 0, we must have (0(t)10)) = 0. Orthogonal states
must remain orthogonal.
Consider now a complete orthogonal set OE . We have, for every 0,
E1(410)12E1. (14)
Therefore, if there is some m for which I(Ont(t)10(t))12 < 1(0m(0)10(0))12 , there

must also be some n for which 1(00)10(0)12 > 1(0„(0)1t,b(0))12. Then the entropy
of a mixture of On and 0 will spontaneously decrease in a closed system, in violation
of the second law of thermodynamics.
If we want to retain the second law, we must have R0(010(0)12 = 10(0)10(0))12
for every 4 and 0. It then follows from Wigner's theorem19 that phases can be cho-
sen in such a way that, for every 0, the mapping 0(0) —0 tk(t) is either unitary
or antiunitary. The second alternative is ruled out by continuity. The evolution of
quantum states is unitary—SchrOdinger's equation must be linear—if we want to
retain the other axioms of quantum theory and also the second law of thermody-
namics.
SUMMARY AND OUTLOOK

The advent of quantum theory solved one of the outstanding puzzles of classi-
cal thermodynamics, Gibbs' paradox.8 Conversely, thermodynamics imposes severe
constraints on the axioms of quantum theory. The second law of thermodynam-
ics would be violated if it were possible to distinguish non-orthogonal states, if
Schrodinger's equation were nonlinear, or if a single quantum could be cloned."
All these impossibilities are related to each other: Non-orthogonal states could
easily be distinguished if single quanta could be cloned, but this is forbidden by
the no-cloning theorem," which in turn follows from the linearity of Schr8dinger's
equation.
The key to the above claims is the equivalence of the von Neumann-Shannon
entropy to the ordinary entropy of thermodynamics. The proof of this equivalence
given here relies on the introduction of a mock Hilbert space with fictitious de-
grees of freedom. This is a perfectly legitimate way of proving theorems. However,
this proof, as well as that of von Neumann, assumes the validity of Hamiltonian
dynamics (in order to derive the existence of thermal equilibrium) and this point
is suspicious. It may be unfair to invoke Hamiltonian dynamics in order to prove
some theorems, and then to claim as a corollary that non-Hamiltonian dynamics is
inconsistent. Thus, the final conclusion of the present work is that if the integrity
of the axiomatic structure of quantum theory is not respected, then every aspect
of the theory has to be reconsidered ab initio.
ACKNOWLEDGMENT
This work was supported by the Gerard Swope Fund and by the Fund for Encour-
agement of Research at Technion.
356 Asher Peres
REFERENCES
1. Chirikov, B. V. "Transient Chaos in Quantum and Classical Mechanics."
Found. Phys. 16 (1986):39-49.
2. Datta, A., and D. Home. "Quantum Non-Separability Versus Local Realism:
A New Test using the B°B° System." Phys. Lett. A 119 (1986):3-6.
3. de Broglie, L. Une Tentative d'Interpritation Causale et Nonlineaire de la
Micanique Ondulatoire. Paris: Gauthier-Villars, 1956.
4. Einstein, A. "Beitrage zur Quantentheorie." Verh. Deut. Phys. Gesell. 16
(1914):820-828.
5. Einstein, A. "Quantentheorie der Strahlung." Phys. Z. 18 (1917):121-128.
6. Herbert, N. "FLASH-A Superluminal Communicator Based Upon a New
Kind of Quantum Measurement ." Found. Phys. 12 (1982):1171-1179.
7. Ivanovic, I. D. "How to Differentiate Between Non-Orthogonal States." Phys.
Lett. A 123 (1987):257-259.
8. Lande, A. Foundations of Quantum Theory. New Haven: Yale Univ. Press,
1955, 10-13.
9. Partovi, M. H. "Quantum Thermodynamics." Phys. Lett. A. 137 (1989):440-
444; see also contribution to the present volume.
10. Peres, A. "Relativity, Quantum Theory, and Statistical Mechanics are Com-
patible." Phys. Rev. D 23 (1981):1458-1459.
11. Peres, A. "How to Differentiate Between Non-Orthogonal States." Phys. Lett.
A 128 (1988):19.
12. Rosen, N., "On Waves and Particles." J. Elisha Mitchell Sci. Soc. 61 (1945):
67-73.
13. Shannon, C. "A Mathematical Theory of Communication." Bell Syst. Tech.
J. 27 (1948):379-423, 623-655.
14. Stapp, H. P. "The Copenhagen Interpretation." Am. J. Phys. 40 (1972):1098-
1116.
15. von Neumann, J. Mathematical Foundations of Quantum Mechanics. Prince-
ton, NJ: Princeton Univ. Press, 1955, 358-379.
16. Wehrl, A. "General Properties of Entropy." Rev. Mod. Phys. 50 (1978):221-
260.
17. Weinberg, S. "Particle States as Realizations (Linear and Nonlinear) of Space-
time Symmetries." Nucl. Phys. B (Proc. SuppL) 6 (1989):67-75.
18. Wigner, E. P. "On the Quantum Correction for Thermodynamic Equilib-
rium." Phys. Rev. 40 (1932):749-759.
19. Wigner, E. P. Group Theory. New York: Academic Press, 1959, 233-236.
20. Wootters, W. K., and W. H. Zurek. "A Single Quantum Cannot Be Cloned."
Nature 299 (1982):802-803.
21. Zemansky, M. W. Heat and Thermodynamics. New York: McGraw-Hill, 1968,
561-562.
M. Hossein Partovi
Department of Physics, California State University, Sacramento, California, 95819
Entropy and Quantum Mechanics
Entropy is a natural and powerful idea for dealing with fundamental prob-
lems of quantum mechanics. Recent results on irreversibility and quantum
thermodynamics, reduction and entropy increase in measurements, and the
unification of uncertainty and entropy demonstrate the fact that entropy is
the key to resolving some of the long-standing problems at the foundations
of quantum theory and statistical mechanics.
INTRODUCTION
A distinctive feature of quantum theory is the highly nontrivial manner in which
information about the quantum system is inferred from measurements. This feature
is obscured in most discussions by the assumption of idealized measuring devices
and pure quantum states. While for most practical purposes these are useful and
reasonable approximations, it is important in dealing with fundamental issues to
recognize their approximate nature. This recognition follows from the simple ob-
servation that in general measuring devices can not fully resolve the spectrum of

358 M. Hossein Partovi
the physical observable being measured, a fact that is self-evident in the case of
observables with continuous spectra. A consequence of this remark .is that in gen-
eral realizable quantum states can not be pure and must be represented as mixed
states.' Equivalently, a quantum measurement in general is incomplete in that it
fails to provide an exhaustive determination of the state of the system. The problem
of incomplete information, already familiar from statistical mechanics, communica-
tion theory and other areas, is thus seen to lie at the very heart of quantum theory.
In this sense, quantum mechanics is a statistical theory at a very basic level, and
there should be little doubt that entropy, a key idea in dealing with incomplete in-
formation, should turn out to play a central role in quantum mechanics as well. The
main purpose of the following account is to demonstrate this assertion by means of
examples drawn from recent work on the subject.
ENTROPY
To define entropy at the quantum level, we shall start with the notion of entropy as-
sociated with the measurement of an observable, the so-called measurement entropy.
We shall then show that ensemble entropy, given by the well-known von Neumann
formula, .follows from our definition of measurement entropy by a straightforward
reasoning. Later, we will establish the identity of this ensemble entropy with ther-
modynamic entropy, so that there will be no distinction between "information" and
"physical" entropies (other than the fact that, strictly speaking, the latter is only
defined for equilibrium states).
In general, a measurement/preparation process involves a measuring device
D designed to measure some physical observable A. Let fl be the density matrix
representing the state of the system and A the operator representing observable A.
Thus the quantum system is a member of an ensemble of similarly produced copies,
some of which are subjected to interaction with the measuring device and serve to
determine the state of the ensemble. The measuring device, on the other hand,
involves a partitioning of the range of possible values of A into a (finite) number
of bins, {a}, and for each copy of the system measured, determines in which bin
the system turned up. In this way, a set of probabilities {Pill} is determined. What
becomes of the state of the copies that actually interact with the measuring device
is an important question (to be discussed later), but one that is distinct from
the issue of the state of the ensemble. Indeed many measurement processes are
partially or totally destructive of the measured copies of the system. The purpose
of the measurement/preparation process is thus to gain information about the state
of the ensemble by probing some of its members, often altering or even destroying
the latter in the process.
The fact that A represents a physical observable insures that any partition of
its spectrum generates a similar (orthogonal) partition of the Hilbert space, given
by a complete collection of projection operators {*A} in one-to-one correspondence
Entropy and Quantum Mechanics 359
to {cri}. Furthermore, quantum mechanics tells us that the measured probabilities

are given by PiA = tr(frell), where "tr" denotes the trace operation.
How much information has the measurement produced on the possible val-
ues of A? Equivalently, how much uncertainty is there about the value of A in a
given measurement? Following the work of Shannon,1° we have a well-established
information-theoretic answer to this question. This is the quantity
s(p) = - ln Pi A , (1)
which will be called measurement entropy. It is a non-negative quantity which equals

zero when one of the quantities PiA equals unity (hence no uncertainty) and equals
In N when the probabilities are all equal to 1/N (hence maximum uncertainty); N
is the number of bins. Note that S(pP)) is a joint property of the system and the
measuring device. Indeed if a device with a finer partition (i.e., higher resolution) is
used, the resulting measurement entropy will in general be greater. In fact it is useful
to consider a maximal device, designated Din., which is defined to be the idealized
limit of a sequence of devices with ever-finer partitions. Clearly, 0 < S(pID) < In N
and S(AID) S(j3ID.).
Consider now a series of measurements, involving observables As', measuring
devices Di', partitions {ai}, measured probabilities {P19}, and measurement en-
tropies S(pP). Each of these entropies describes the uncertainty appropriate to
the corresponding measurement. Is there an entropy that properly describes the
uncertainty appropriate to the system as a whole, regardless of the individual
measurements and how they were carried out? Clearly, such an overall measure
of uncertainty should be gauged against devices with the highest possible resolu-
tion, i.e., against maximal devices .1).;ax. Moreover, if for two operators A and
s(AID,1.) < s(pIDe,.), then A must be deemed a better representative of the
available information on p than B. From these two requirements we conclude that
the quantity we are seeking is given by the least upper bound of measurement
entropies S(AID„A.) as A is varied over all possible observables;
S(p) = iAf S (AIDgax) • (2)
One can show that the right-hand side of Eq. (2) is realized for A= p. The cor-
responding minimum is then found to be the von Neumann expression for ensemble
entropy, —trti In 0. Starting from the elementary definition (1) for measurement en-
tropy, we have thus arrived at the standard expression for ensemble entropy. We
shall show later that S(p) coincides with the thermodynamic entropy, assuring us
that information entropy and physical entropy are the same. For these reasons, we
shall refer to the ensemble entropy, S(A), as the Boltzmann-Gibbs-Shannon (BGS)
entropy also.
ENTROPY AS UNCERTAINTY: THE MAXIMUM UNCERTAINTY/

ENTROPY PRINCIPLE
The measurement entropy defined in Eq. (1) is a good measure of the degree of
uncertainty in the measured values of A, and it can also be used to describe the
joint uncertainty of two incompatible observables. Indeed, following Deutsch,2,5
one can define the joint uncertainty of a pair of measurements to be the sum of
their measurement entropies, and proceed to demonstrate that such a measure
possesses the correct properties. Furthermore, in many ways the entropic measure
of uncertainty proves to be more appropriate than others, particularly in dealing
with fundamental questions of measurement theory.1,2,3
As an example consider the problem of describing the state of a quantum system
on the basis of (incomplete) information obtained from a series of measurements
such as described in the previous section. Clearly, we must demand that the state of
the system, described as a density matrix A, must (a) incorporate all the known data,
and (b) imply nothing that is not implied by the measured data. Operationally,
these conditions are implemented by demanding that p reproduce the known data
and otherwise imply as little else as possible. Thus the answer is obtained by maxi-
mizing the BGS entropy subject to the constraints implied by the known data. This
quantum principle of maximum uncertainty/entropy, which closely follows Jaynes'
maximum entropy principle,3 was developed by Blankenbecler and Partovi.1 A gen-
eralization of this formalism to the case where measurements are carried out at
different times was developed by Partovi and Blankenbecler6 and applied to the
long-standing problem of time-energy uncertainty relations. It was then possible
to establish unambiguous definitions of the meaning of these relations and achieve
rigorous derivations of the corresponding inequalities. These results provide a con-
vincing demonstration of the power and relevance of entropic methods in quantum
mechanics.
QUANTUM THERMODYNAMICS
Are the laws of thermodynamics—equivalently, any of the postulates commonly
adopted as the basis of statistical mechanics—independent laws of nature, or do
they in fact follow from the underlying dynamics? Ever since Boltzmann's brilliant
attempt at deriving thermodynamics from dynamics by means of his H-theorem,
there have been countless attempts at resolving this issue." We believe the question
has now been settled at the quantum leve1,7 and it is our purpose here to define
thermodynamics for quantum systems and describe how the zeroth and second
laws actually follow from quantum dynamics without any further postulates (the
first and third laws are direct consequences of dynamical laws and need not be
considered).
As described earlier, an ensemble of similarly prepared quantum systems is

described by a density matrix k. Note that an ensemble is not a physical aggregate,
and the members of an ensemble must not have any physical influence upon one
another. Thus, strictly speaking, a gas of molecules interacting with some other
system is not the same thing as an ensemble of molecules interacting with that
system; in the latter instance, the molecules are presumed to interact independently,
one molecule at a time.
The subject of quantum thermodynamics is interaction between individual
quantum systems. Thus as system a, a member of an ensemble described by pa
interacts with system b, a member of an ensemble described by kb, one inquires
whether the usual thermodynamic laws hold. Just as in macroscopic thermodynam-
ics, one can distinguish a special category of processes which may be characterized
as interactions with given, external forces, and which may be described by changes
in the Hamiltonian operator of the system. Strictly speaking, such descriptions are
always approximate, albeit useful ones, and rely on the assumption that the effect
of the system on the agent causing the "external force" may be neglected. Thus
in such cases the dynamics of the external agent is ignored and one speaks of ex-
change of energy in the form of work. The general situation, on the other hand,
involves the dynamics of both systems and corresponds to thermal interactions and
the exchange of heat.
As usual, we define an ensemble (or a state) to be stationary if it is constant,
i.e., if (a /at)p = 0. Furthermore, we shall say that a pair of stationary states are in
equilibrium if, upon interaction, they remain unchanged. It is not difficult to show7
that in general a pair of states a and b will be in equilibrium if and only if they
are of the form exp(--(3.H.) and exp(—i3oHo), respectively, with /3. = A. Here H
denotes the Hamiltonian operator for each system (in the absence of the other).
These states are known as Gibbs states and play a unique and distinguished role,
as will become evident shortly.
Now consider a typical interaction between a member of ensemble a and a
member of another, independently prepared ensemble b. In such a situation, what
are the chances that the two are correlated before the interaction? Under ordinary
circumstances, the answer is essentially zero. Upon interaction, on the other hand,
they will in general develop correlations, so that the sum of the individual BGS
entropies for the two systems will be greater subsequent to the interaction; herein
lies the origin of the second law.
To see the connection just asserted, let us first inquire what happens if one
of the two systems, say b, is in a Gibbs state before the interaction starts. Using
certain general properties of entropy, we find7 that the inequality ASb —46E6 < 0
holds and from this conclude that
AS. — f3bAU. > 0 . (3)
Here AS. and AU. are the change in the entropy and energy of system a, and f3b
is the parameter characterizing the initial Gibbs state of system b. It is important
to realize that the inequality in Eq. (3) is a nonequilibriurn result, since, except
for the initial state of system b, all other states (including the final state of b)
will in general be nonequilibrium states. Furthermore, there is no implication in
Eq. (3) that the changes in entropy or energy of either system are in any way small.
Finally, appearances notwithstanding, the left-hand side of Eq. (3) is not related
to a change in the Helmholtz free energy of system a (a quantity which is only
defined for equilibrium states; besides, fib is a property of the initial state of b and
has nothing to do with system a).
The zeroth law can now be obtained from Eq. (3) by considering both systems
a and b to be initially in Gibbs states. Then one has AS0 — fiaDUa < 0 as well
as ASb — f3bAUb < 0. These combine to give /3.2AUa + f3b0Ub > AS0 + A.Sb > 0.
Since AU. + AUb = 0 (conservation of energy), one has (f3,2 — /3b)AUa > 0. This
inequality implies that the flow of energy is away from the system with the smaller
value of the parameter. With identified as inverse temperature, and the property
established earlier that Gibbs states with the same value of do not change upon
interaction, we have arrived at the zeroth law of thermodynamics (note that in our
units Boltzmann's constant equals unity).
To derive the second law, consider a cyclic change of state for system a brought
about by interaction with a number of systems bi which are initially in equilibrium
at inverse temperatures Each interaction obeys inequality (3), so that ASci —
> 0 for the ith interaction. Since in a cyclic change AS = AU = 0, it
follows that Ei OSai = 0. Summing the inequality stated above on the index i, one
arrives at
EthAUci 0. (4)
This inequality is a precise statement of the Clausius principle. Note that in con-
ventional terms AUgi would be the heat absorbed from system bi, as explained
earlier. Note also that system a need not be in equilibrium at any time during the
cycle, and that the f3i only refer to the initial states of the systems
The Clausius principle established above is equivalent to the second law of
thermodynamics, and the entropy function defined from it is none other than the
one we have been using, namely the BGS entropy.
Further results on approach to equilibrium, the unique role of the canonical
ensemble in quantum thermodynamics, and the calculation of the rate of approach
to equilibrium in a specific example can be found in Partovi?
REDUCTION AND ENTROPY INCREASE IN QUANTUM

MEASUREMENTS
In describing the measurement process earlier, we postponed the discussion of what
actually happens to those copies of the quantum system that are subjected to
interaction with the measuring device. The purpose of this section is to consider
the system-device interaction and derive the phenomenon of reduction characteristic

of a quantum measurement.
The problem of course is that the evolution of a quantum system during its
interaction with a measuring device appears to be in violation of the dynamics it is
known to obey at other times, a paradox that is known as the measurement prob-
lem. Indeed it is customary to postulate that a quantum system does not obey the
known dynamics of evolution during a measurement process, thereby disposing of
the measurement problem by decree. Many physicists, however, believe that this
reduction postulate is merely a working model of an underlying process to be un-
covered and explained on the basis of known dynamics. Indeed important progress
has been made along these lines by Zeh,12 Zurek,13 Peres9 and others who have
emphasized that the seemingly paradoxical behavior of a quantum system during
the act of measurement results from interactions with the environment, i.e., with
the countless unobserved degrees of freedom to which the system-device complex is
unavoidably coupled. In the following, we shall describe the main elements of a re-
cent analysiss that demonstrates in a general and rigorous manner how interaction
with the environment leads to the reduction of the state of the quantum system
during the act of measurement.
Recall that a device used to measure an observable A entails a partition of the
range of values of A into a number of bins, {ad, and a corresponding decomposition
of the Hilbert space generated by the projection operators {frit}. Let the state of the
quantum system before measurement be described by P. When it is projected onto
the eigenmanifolds of the partition, appears as Ei frittAirl, an expression which
may be written as a sum of diagonal and off-diagonal contributions as follows:
= Efrlioire4-E evil' E PR + . (5)

Jo'
During the measurement process the system interacts with the device, thereby
establishing the correlations that will serve to yield the sought-after information by
means of a reading of the final state of the measuring device. In symbols, the initial
system-device density matrix, St = At (with I' representing the initial state of the
device) evolves after a time T into
o(T) = E ktOir.tti + hR f2' • (6)

i i#i
Here ti represents that state of the device which corresponds to the value of A
turning up in the bin ai. By contrast, Fq represents a state of the device which
corresponds to the state of the system being of the non-diagonal form Now
in a proper measurement, such non-diagonal contributions are never observed, i.e.,
1/' is absent, and all one sees of SI(T) is the reduced part S2R. This disappearance
of the off-diagonal contribution St' constitutes the crux of the measurement prob-
lem. We will now describe how interaction with the environment in fact serves to
eliminate ft' and leave nR as the final state of the system-device complex.
To establish the result just stated, first we need a theorem on the decay of
correlations. Let the correlation entropy, CAB, between two systems A and B be
defined as the difference SA + SB - SAB. Note that CAB is non-negative, vanishing
only when the two systems are uncorrelated, i.e., when pAB = 13 AAB. Now consider
four systems A, B, C and D, initially in the state PABcD(0) = ijAB (0)0c (0))5B (0).
The notation implies that systems A and B are initially correlated while all other
pairs are initially uncorrelated. Starting at t = 0, system A interacts with system
C while system B interacts with system D. Then, using a property of the BGS
entropy known as strong subadditivity,4 one can shows that CAB (t) < CAB(0).
In other words, interactions with other systems will in time serve to decrease the
correlations initially present between A and B. This intuitively "obvious" result is
actually a highly nontrivial theorem that depends on the subadditivity property of
entropy, itself a profound property of the BGS entropy.
A measuring device, or more accurately the part of it that directly interacts
with the quantum system, has a very large cross section for interaction with the rest
of the universe, or its environment. Therefore, although the system-device interac-
tion ceases after the establishment of correlations in f2(T), the device continues to
interact with the environment. According to the result established above, on the
other hand, this causes the system-device correlations to decay, so that the final
value of the system-device correlation entropy will be the minimum consistent with
the prevailing conditions.
A closer examinations of the structure of S'i(T) in Eq. (6), together with the
conditions that the measuring device must obey, reveals that the minimum system-
device correlation entropy is reached when ft' = 0, i.e., when 52(T) is in fact reduced
to SIR, thus establishing the fact that it is the interaction with the environment
which brings about the reduction of the state of the system. It is now clear why
reduction appears to be totally inexplicable when viewed in the context of system-
device interactions only.
The reduction process described above entails an entropy increase, given by
AS = S(OR) — S(0). A straightforward calculation of this entropy increase givess
AS = trian — tr E itiOn frit? , (7)
with the obvious interpretation that the entropy increase comes about as a result
of reducing the initial state of the system ji to the final state Ei fritiwrti, with the
off-diagonal elements removed; cf. Eq. (5).
As an application of Eq. (7), we will consider the measurement (in one dimen-
sion) of the momentum of a system initially in a pure (hence idealized) Gaussian
state with a momentum spread equal to p. The measuring device will be assumed
to have uniform bins of size Ap (roughly equal to the resolution of the momentum
analyzer). Then one finds from Eq. (7) that AS = — In Pi, where
Pi = (7.p2)-1/2 dp exp pz
Here the integral extends over the ith bin. Note that AS is precisely what we named
measurement entropy before.
Consider now the following limiting values of S. For a crude measurement,
Ap >> p, such that practically all events will turn up in one channel (or bin), say,
channel k. Then we have Pk = 1, P • 0 for i # k, and we find AS 'Z. 0, exactly
as expected. For a high resolution analyzer, on the other hand, Op << p, so that
st_ (722)-1/20P exp(pi 2 p2),
and we find
1
AS -.15 —(3 + In r) + In (—) , (/.1 >> Ap). (8)
2
Thus the entropy increase for reducing the state of the system grows indefinitely
as the resolution of the momentum analyzer is increased. Again this is exactly as
expected, and points to the impossibility of producing pure states by means of a
(necessarily) finite preparation procedure.
It should be pointed out at this point that Eq. (7) actually represents a lower
limit to the amount of entropy increase in a measurement, and that the actual value
can be far larger than this theoretical minimum.
CONCLUDING REMARKS
In the preceding sections we have described certain basic ideas about the role and
meaning of entropy in quantum mechanics, and have outlined a number of appli-
cations of these ideas to long-standing problems in quantum theory and statistical
mechanics. Among these are the quantum maximum uncertainty/entropy principle,
multitime measurements and time-energy uncertainty relations, the reversibility
problem of statistical mechanics, and the measurement problem of quantum the-
ory. On the basis of the results obtained so far (the details of which can be found in
the original papers cited above), it should be amply clear that entropy, properly de-
fined and applied, is a most powerful notion for dealing with problems of foundation
in quantum mechanics. As remarked earlier, this is because the manner in which
measurements yield information about a quantum system is unavoidably statisti-
cal in nature, thus entailing all the usual consequences of dealing with incomplete
information, including entropy.
In retrospect, it is rather remarkable how the dynamics of elementary, micro-
scopic systems of a few degrees of freedom can turn into a statistical problem of
considerable complexity when dealing with measured data.
ACKNOWLEDGMENTS
This work was supported by the National Science Foundation under Grant No.
PHY-8513367 and by a grant from California State University, Sacramento.
REFERENCES
1. Blankenbecler, R., and H. Partovi. "Uncertainty, Entropy, and the Statistical
Mechanics of Microscopic Systems." Phys. Rev. Lett. 54 (1985):373-376.
2. Deutsch, D. "Uncertainty in Quantum Measurements." Phys. Rev. Lett. 50
(1983):631-633.
3. Jaynes, E. T. "Information Theory and Statistical Mechanics." Phys. Rev.
106 (1957):620-630.
4. Lieb, E., and M. B. Ruskai. "A Fundamental Property of Quantum-
Mechanical Entropy." Phys. Rev. Lett. 30 (1973):434-436.
5. Partovi, H. "Entropic Formulation of Uncertainty for Quantum Measure-
ments." Phys. Rev. Lett. 50 (1983):1882-1885.
6. Partovi, H., and R. Blankenbecler. "°Time in Quantum Measurements." Phys.
Rev. Lett. 57 (1986):2887-2890.
7. Partovi, H. "Quantum Thermodynamics." Phys. Lett. A 137 (1989):440-444.
8. Partovi, H. "Irreversibility, Reduction, and Entropy Increase in Quantum
Measurements." Phys. Lett. A 137 (1989):445-450.
9. Peres, A. "When is a Quantum Measurement." Am. J. Phys. 54 (1986):688-
692.
10. Shannon, C. "A Mathematical Theory of Communication." Bell Syst. Tech.
J. 27 (1948):379-423, 623-655.
11. Wehrl, A. "General Properties of Entropy." Rev. Mod. Phys. 50 (1978):221-
260.
12. Zeh, H. D. "On the Irreversibility of Time and Observation in Quantum The-
ory." In Foundations of Quantum Mechanics, edited by B. d'Espagnat. New
York: Academic Press, 1971.
13. Zurek, W. H. "Environment-Induced Superselection Rules." Phys. Rev. D 26
(1982):1862-1880.
0. E. Rossler
Institute for Physical and Theoretical Chemistry, University of Tubingen, 7400 Tubingen,
West Germany
Einstein Completion of Quantum Mechanics

Made Falsifiable
An experiment is proposed in which correlated photons are used to obtain

information on the relativistic measurement problem. In a long-distance
version to the Ou-Mandel experiment, one of the two measuring stations
can be put into an orbiting satellite whose position is tightly monitored.
The current view, which holds that in invariant spacetime, mutually in-
compatible interpretations of measured data are possible (so that in effect
the commutation relations can be violated), thereby becomes amenable to
empirical falsification. Chances are that there will be no surprise. The al-
ternative: A new non-local quantum effect (and a strengthened role of the
observer in the sense of Wheeler) may turn out to exist.
The big mystery in the formalism of quantum mechanics is still the measure-
ment problem—the transition from the linear probability amplitude formalism to
the nonlinearly projected individual events.10 In contrast, the "relativistic measure-
ment problem," in which everything is compounded by relativistic considerations,
has so far even resisted all attempts at formalization.1,2,6,12,16
In the context of the ordinary measurement problem, the paradigm of corre-
lated photons3,a,11,19 has already proven an invaluable empirical tool. While the
two individual projection results remain probabilistic, they nevertheless are strictly

368 0. E. Flossier
correlated across the pair—as if one and the same particle were available twice!
Therefore, the question arises of whether the same tool may not be transplanted
away from its original domain (that of confirming quantum mechanics) in order to
be used as a probe in the unknown terrain of relativistic quantum mechanics.
A similar proposal has been once made with disappointing results. Einstein8,5
had thought of subjecting two correlated particles to a condition in which both
are causally insulated (space-like separated) in order to, at leisure, collect from
each the result of a different projected property of the original joint wave function.
Since, in this way, two incompatible (noncommuting) measurement results could be
obtained from the same wave function, his declared aim was to "complete" quantum
mechanics in this fashion. To everyone's surprise, Bells was able to demonstrate that
the two particles remain connected "non-locally." They behave exactly as if the
distant measurement. performed on the first had been performed twice, namely on
the second particle, too, in the form of a preparatory measurement. More technically
speaking, the distant measurement throws both particles into the same eigenstate
(reduction at a distance). The achievement of Bell was to show that this implication
of the quantum-mechanical formalism is indeed incompatible with any pre-existing
set of properties of the two particles that would make the effect at a distance only
an apparent one. A painstaking analysis of all relative angles and their attendant
correlations was the key step. Thus, Einstein's intuition was proven wrong for once.
No more than one "virgin reduction" of the original wave function need be assumed.
However, the mistake made by Einstein may have been smaller than meets the
eye. his specific proposal to use relativistic insulation (space-like separation) as a
means to "fool" quantum mechanics was presumably chosen for didactic reasons
only. The larger idea—to use relativity theory for the same purpose—is still un-
consummated. There may exist a second mechanism at causal separation between
two space-like separated events that when applied to the two measurements might
indeed "decouple" them, so that quantum mechanics could be fooled indeed—or
else would have to respond with an even more vigorous and surprising defense.
Such a second mechanism, in fact, exists as is well known. The temporal order-
ing between two space-like separated events (their causal relationship, so to speak)
is not a relativistic invariant. The very "connection" discovered by Bell makes this
result, which ordinarily poses no threat to causality,'-' conducive to carrying an
unexpected power.
Let us illustrate the idea in concrete terms (Figure 1). The two measuring
stations used in the Aspect3'4 experiment here are assumed to have been put in
motion relative to each other. Moreover, the two distances are chosen so carefully
that exactly the above condition (reversal of priority relations as to which mea-
suring device is closer to the point of emission in its own frame) is fulfilled. In
consequence, each half experiment is identical with an ordinary Aspect experiment
in which the most important measurement (the first) has already taken place. Only
after this first reduction has been obtained in the frame in question will there be
a second measurement. This second measurement, of course, will be performed by
a moving (receding) measurement device. But, since by that time the joint reduction
Completion of Quantum Mechanics 369
X'
FIGURE 1 Lorentz diagram of the proposed relativistic experiment using correlated

photons. The two measuring devices (dashed lines) are situated in two different frames
(primed and double-primed, respectively) that are receding from each other. The singlet-
state-like wave function of the photon pair (Si) is subject to two primary reductions valid
simultaneously in the third (unprimed) frame. See text. e.s. = eigenstate.
of the photon's state has already been accomplished, the other photon already
possesses a well-defined spin. Hence, by the well-known fact that a photon's spin
the same fixed yields outcome whatever the state of the head-on motion of the
measuring device,7 there is indeed no difference relative to an ordinary Aspect
experiment. A "catch-22" situation has therefore been achieved.
The question is open how nature will respond to this experiment. Let us, there-
fore, first check whether it can, in fact, be done (Figure 2). Here for simplicity
only two frames are assumed, one stationary and the other moving. Both are com-
pletely symmetric. If, as shown, detector delays of 3 nanosecond (d =1 light meter)
are assumed, and if, in addition, satellite velocity (v = 11 km/sec) is assumed for
the second frame, so that v/c = 4 x 10-5 K 1, one sees from the diagram that
s = d x c/v = 2.5 x 104m = 25km = l6mi. This amounts to a rather "long-
distance" version of the Aspect experiment. The weak intensity of the source used
in the latter3'4 would certainly forbid such an extension. Moreover, the two photons
are not simultaneously emitted in this experiment 3,4,11 Therefore, it is fortunate
that a more recent experiment exists which is both high-intensity and of the simul-
taneously emitting type since the two photons are generated in parametric down
conversion before being superposed.13 Therefore, the present experiment can be
actually implemented in two steps, by first scaling up the Ou-Mandel experiment,
370 0. E. Flossier
t",
ti
s z (e.g.)
►
eigen-
state
s eigerr
z state. •
I /
=d
x'
_— •r 00 ""
g.
lm
FIGURE 2 The experiment of Figure 1, redrawn in more detail in the two frames x'
and x". Note that the slope of the x" axis equals v/c, but also is equal to d/s (with
d measured in meters) as long as v is much smaller than c. With d and v fixed, s (the
minimum distance to the source from either measuring device in its own frame) can,
therefore, be calculated. Compare text.
and by then making one of the two measuring devices (analyzer plus detector)
spacebound.
This essentially concludes the message of the present note. What remains is
to make the connection to other work. While the present experiment is new, Shi-
mony17 recently looked at a rather similar case. His mathematical analysis (done
without pictures) fits in perfectly as a complement to the present context. The
only difference: He did not differentiate between the two measuring devices being
mutually at rest or not. He, therefore, could rely entirely on the Bell experiment,
with his added analysis only having the character of a gedanken experiment that
cannot and need not be done since all the facts are available anyhow. His conclu-
sion nevertheless was quite revolutionary since it culminated in the conjecture that
the quantum-mechanical notion of a measured eigenstate may have to be redefined
such that it becomes frame-dependent.
Shimony's conclusion had been reached before by Schlieder" and Aharonov
and Albert.2 These authors applied a well-known axiom from relativistic quantum
mechanics (that two space-like separated measurements always commuters) to cor-
related particles, arriving at the theorem that the same particle may possess multi-
ple quantum states (density matrices) at the same point in spacetime. Specifically,
these states form—in accordance with earlier proposals of Dirac, Tomonaga and
mechanics (that two space-like separated measurements always commuters) to cor-

related particles, arriving at the theorem that the same particle may possess multi-
ple quantum states (density matrices) at the same point in spacetime. Specifically,
these states form—in accordance with earlier proposals of Dirac, Tomonaga and
Schwinger—a "functional" on the set of space-like hypersurfaces intersecting the
point, such that on each hypersurface the rules of nonrelativistic quantum mechan-
ics are obeyed.2 Thus, a new "selection rule" is invoked which picks the admissible
interpretation out of a set of invariant empirical facts. Since these facts imply what
can be called an "overdetermination" in spacetime relative to what naive quantum
mechanics permits, Einstein's prediction appears to be vindicated. An alternative
interpretation, given by Park and Margenau,14 is even more fitting. These authors
proposed to acknowledge the weakened status of the commutation relations by
modifying one of the axioms of quantum mechanics used in von Neumann's si-
multaneous measurability theorem18 ("weak" rather than strong correspondence
between linear Hermitean operators on Hilbert space with complete orthonormal
sets of eigenvectors on one hand and physical observables on the other).
However, after the unsuspected success of Bell in keeping up the status of the
commutation relations in the nonrelativistic realm, it is a legitimate question to ask
whether or not one can be absolutely sure that Einstein's "improved" (relativistic)
proposal is in accordance with reality. Specifically, can the paradoxical fact that
from the point of view of relativistic quantum mechanics, Aspect's experiment
implies exactly the opposite of what it does in nonrelativistic quantum mechanics—
namely, that the commutation relations are not respected in spacetime—perhaps
be subjected to empirical scrutiny?
Now the experiment of Figure 1 becomes of interest. It is "stronger" than the
ordinary Aspect experiment. That the latter opinion may already suffice to obtain
a violation of the commutation relations in the relativistic realm17 is subject to
the objection that in the rest frame of the two measuring devices, no violation
occurs, and that, therefore, no violation can be inferred for by-flying frames which
only pick up the macroscopic events (induced light flashes, say) that occur in that
frame. For these particular macroscopic events cannot have any deeper consequences
concerning causality than any other pair of macroscopic events has. Moreover, even
if the assertion of an exception existing were correct, it would not be verifiable since
nothing new can be learned by definition from the whole set-up since only known
data enter it. These objections do not apply to the experiment of Figure 1. It (a)
is critically different from all data recorded previously, and (b) certainly implies
a violation of the commutation relations if its outcome is unchanged compared to
the ordinary Aspect experiment, since, as shown above, two mutually incompatible
Aspect experiments are combined in it according to Bell's nonrelativistic theory.
Chances are, of course, that the current opinion that relativistic invariance
(unlike relativistic locality) has precedence over the commutation relations, can
be upheld after the experiment has been performed because its outcome will be
negative: no change in correlations relative to the standard Aspect experiment.
However, now there is a tiny bit of a chance that, should something be profoundly
wrong with our opinion about nature, this fact will make itself felt somehow as one
372 0. E. Rossler
"save" the commutation relations (by. excluding joint reductions) and spell the
end of the doctrine of an observer-invariant spacetime (since the state of motion
of a measuring device could affect photon spin). However, it would be much too
"heavy" to be seriously proposed as a prediction. All quantitative theory available
would thereby be contradicted. There is nothing on the horizon that could seriously
threaten the invariance of quantum spacetime. What is new is only that the latter
has become empirically confirmable (and therefore also "in principle falsifiable" )
for the first time.
To conclude, a new quantum experiment feasible with current technology has
been proposed. The status of the commutation relations in the relativistic measure-
ment problem can be decided. Specifically, the "space-borne" Ou-Mandel experi-
ment will show whether, (1) the current idea that there exists an observer-invariant
quantum spacetime can be upheld (Einstein completion) or (2) the observer is re-
inforced in his aparticipatory"20 role in a whole new context.
ACKNOWLEDGMENTS
I thank Wojciech Zurek and Jens Meier for discussions and John Bell for a correction
concerning Figure 2.
Added in Proof: In 1984 Peres15 used a diagram similar to Figure 1, which was
redrawn for this paper.
REFERENCES
1. Aharonov, Y., and D. Z. Albert. Phys. Rev. D24 (1981):359.
2. Aharonov, Y., and D. Z. Albert. Phys. Rev. D29 (1984):228.
3. Aspect, A., D. Grangier, and G. Roger. Phys. Rev. Lett. 49 (1982):91.
4. Aspect, A., J. Dalibard, and G. Roger. Phys. Rev. Lett. 49 (1982):1804.
5. Bell, J. S. Physics 1 (1964):195.
6. Bloch, I. Phys. Rev. 156 (1967):1377.
7. Bjorken, J. D., and S. D. Drell. Relativistic Quantum Mechanics. New York:
McGraw Hill, 1964.
8. Einstein, A. In Institut International de Physique Solvay, Rapport et Discus-
sions du 5e Conseil. Paris, 1928, 253.
9. Einstein, A., B. Podolsky, and N. Rosen. Phys. Rev. 47 (1935):777.
10. Jammer, M. The Philosophy of Quantum Mechanics, the Interpretations of
Quantum Mechanics in Historical Perspective. New York: Wiley, 1974.
11. Kocher, C. A., and E. D. Commins. Phys. Rev. Lett. 18 (1967):575.
12. Landau, L. D., and R. Peierls. Z. Physik 69 (1931):56.
13. Ou, Z. Y., and L. Mandel. Phys. Rev. Lett. 61 (1988):50.
14. Park, D., and H. Margenau. In Perspectives in Quantum Theory, edited by
W. Yourgrau and A. Van der Merwe. Boston: MIT Press, 1971, 37.
15. Peres, A. Amer. J. Phys. 52 (1984):644.
16. Schlieder, S. Commun. Math. Phys. 7 (1968):305.
17. Shimony, A. In Quantum Concepts in Space and Time, edited by R. Penrose
and C. J. Isham. Oxford: Clarendon, 1986, 182, 193-195.
18. von Neumann, J. In Mathematical Foundations of Quantum Mechanics.
Princeton: Princeton University Press, 1955, 225-230.
19. Wheeler, J. A. Ann. N. Y. Acad. Sci. 48 (1946):219.
20. Wheeler, J. A. "Genesis and Observership." In Foundational Problems in Spe-
cial Sciences, edited by R. E. Butts and K. J. Hintikka. Dordrecht: Reidel,
1977.
21. Zeeman, E. C. J. Math. Phys. 5 (1964):5.
J. W. Barrett
Department of Physics, The University, Newcastle upon Tyne NE1 7RU, United Kingdom
Quantum Mechanics and Algorithmic

Complexity
Although "algorithmic complexity" is in the title of this contribution, I

think the subject of this paper is really why it is so hard to formulate the
notion of algorithmic information for a quantum system, in an intrinsically
quantum manner. Computation is a classical process, and bits are classical
facts.
I am hoping to toss out some questions, and hopefully my remarks will
provoke some thought. However, I don't claim to be providing the answers.
When I gave the talk, I said that there was one "complaint" on every
transparency: these are complaints I have about our current theory, and
frustration with the fact that we understand the quantum so incompletely.
QUANTUM MECHANICS
One fairly standard view of quantum mechanics is the following (see the contri-
butions by Omnes and Gell-Mann in this volume): After irreversible coupling to
the environment, the properties of an object are consistent with the idea that one
of the alternatives for its behavior has occurred in a definite way. The clause "in
a definite way" is important here. I also want to stress that I am saying that the

376 J. W. Barrett
properties of an object become consistent with the idea that something definite has
occurred, but no further; I am not pinning down a time at which one alternative is
chosen.
I would like to consider a universe consisting of a small number of finite quantum
systems, which we think of as exchanging information. There is a difficulty, because
if one system gains information about a second one through an interaction, there is
the possibility that the interaction might be "undone" later on, by some interaction
which makes use of the quantum correlations between the two systems. Thus, the
information gained has only a definite character so long as we forgo the further
"use" of some of the possible quantum interactions. This definiteness which I am
talking about is, I think, captured by the idea of a consistent logic, introduced by
Omnes, and having its roots in Griffiths' idea3 of a consistent history for a quantum
system.
In the real world, there is always a large-scale, decohering environment involved
when we, human beings, gain information: our own bodies, if nothing else. In most
cases, though, it is an inanimate part of the environment. Then, it becomes ef-
fectively impossible to show up the coherence between the small system and the
macroscopic object, because the experimental manipulations involved get too com-
plicated. They would involve a huge amount of information. Incidentally, this shows
up some sort of link between information, of an algorithmic kind (how to program a
robot doing the experiments), and the measure of correlation information discussed
by Everett' which uses the density matrix.
Because the environment "decoheres" interactions in the real world, the present
discussion is optional from a strictly practical viewpoint. However, I am unhappy
with this resolution of the problem, and think that quantum mechanics ought to
make sense without the necessary inclusion of such very large systems.
BELL'S ANALYSIS AND SET-THEORETIC ASSUMPTIONS

In the two-spin experiment of E.P.R. and Bell fame, one can understand Bell's
result by saying that one has four sets of probabilities for the outcomes for each of
the four possible, mutually incompatible experiments (the four experiments relating
to the possibility of setting the analyzer to each of two different angles at the two
different locations), but that there is no overall joint probability distribution for all
sixteen possible outcomes O={(u,u,u,u), (u,u,u,d), • • • ,(d,d,d,d)}. The quadruple
(u,u,u,d), for example, refers to the hypothesis that the left-hand spin would be up
with either orientation of its polariser, while the right-hand spin would be up with
one orientation and down with the other one. This point of view is, I think, due
to Wigner. In arriving at this conclusion, one uses the usual rules of probability
theory; namely, one forms the marginal distribution in the two variables measured,
by summing over the probabilities with different values of the unobserved variables.
Quantum Mechanics and Algorithmic Complexity 377
I would like to stress the following interpretation: there is really no underlying

set theory for the probabilities which one has; the set 0 does not exist. One cannot
say that there is an exhaustive collection of properties of the system which define
it as one element of a particular set of all the possibilites.
Whilst I am discussing the Bell analysis, I would like to mention the possibility
of a computer science version of the result. This came up in the questions to my
talk.
Suppose the universe is a cellular automaton of finite size and that it contains
two-spin experiments, analyzers and all. Suppose also, that the way the machine
works is that each analyzer is set to one of two possible angles according to the value
of a bit (0 or 1) in one particular square near (in space and time) to the analyzer
(the "setting bit"). This bit is in such a position that, due to the finite speed of
propagation of computational "light," its value cannot influence the outcome of
the detection experiment of the other spin. Now run the automaton, but instead of
allowing the setting bits to be calculated by the usual rule table from its nearest
neighbors, insert, by hand as it were, one bit from a truly random sequence of
digits into the setting bit position each time the detection experiment is about to
be performed. Whilst such a thing is not a computable automaton, because of the
random, rather than pseudorandom, sequence, it is still a perfectly well-defined
mathematical object. I think that one would find that, in the long run, if one
compiled the frequencies into probabilities, they would not be able to break Bell's
inequality.
The reason for this is that although in the short term the left-hand analyzer
might be able to successfully "guess," i.e., compute, the right-hand analyzer setting,
and so provide outcomes to the experiment which are, for example, compatible with
quantum mechanics, in the long run this strategy would definitely fail, because one
cannot compute more than some definite initial number of digits of an algorith-
mically random sequence. Thus, the automaton would have to fall back on purely
probablistic methods of providing experimental outcomes, being prepared for ei-
ther analyzer setting, and the results would be in line with the above probablistic
reasoning, giving satisfaction of Bell's inequality.
I don't have a formal proof of the above line of reasoning, so strictly speaking, it
remains a conjecture. There are some assumptions being made; for example, as Ed
Fredkin pointed out in his talk, I am assuming the records of the results obtained
can be freely propagated to a common location in the automaton and combined in
any desired logical combination.
The role of set theory, if I may be allowed to interpret a mathematical formalism
in a physical setting, is a systematic way of defining objects by their properties. As
John Wheeler might like it, set theory answers the question: what does it mean, "to
be"? For example, one of the axioms ("comprehension") asserts that if one has a
set of objects and a proposition, then this defines a second set: the subset on which
the proposition is true. If one reinterprets this in terms of physical measurements
on a system, the set of objects is the set of possible configurations, and the truth
value of a proposition the outcome of a measurement interaction. Reasoning in
378 J. W. Barrett
this very literal way, one is bound to regain the idea of the set 0 for the two-
spin experiment, and not be able to reconstruct quantum mechanics. I think other
axioms of set theory have an interpretation in terms of physical operations based
on a pre-quantum understanding of physics.
Thus we have been accustomed to abandoning the goal of understanding quan-
tum objects entirely in terms of classical set-theoretic constructions, but speak
about them in roundabout ways. This is the source of the tension in debates about
quantum theory. Omnes has clarified exactly to what extent one can use set theo-
retic constructs in quantum theory in a direct way, and where the inconsistencies
set in. To my mind this is very important advance. However, I feel that there ought
to be a set-theoretic language which applies directly to all quantum interactions.
Perhaps it is along the lines Finkelstein has suggested.2
THE QUANTUM SURVEYOR

Let us move to the question of pinning down the actual information in a quantum
two-state spin system. How many bits of information is there in a single system (for
example, a single photon "traveling between" polarizer and analyzer)? The idea of
a bit is itself straight out of classical set theory, the definite and unambiguous
assignment of an element of the set {0,I}, and so the assignment of an information
content to the photon itself is fraught with difficulties. However, one has a feeling
(see the contribution by Schumacher in this volume) that each photon spin cannot
convey more than one bit of information.
The quantum surveyor is a surveyor who finds herself caught out in the field
with nothing to measure the angle between two distant trees on the horizon than a
polarizer, analyzer, photon source, and detector. She points the axes of the polarizer
and analyzer in the directions of the two trees and counts the number of individual
photons which are transmitted through the polarizer and analyzer, as a fraction of
the number which pass the polarizer but are stopped by the analyzer. According
to quantum theory, the inverse cosine of the square root of this fraction converges,
as the number of photons tends to infinity, to the angle. Clearly one needs to use a
large number of photons if one wants to resolve small angles. W. Wootters discussed
in this conference the question of how much information one gains about the angle
with a finite number of photons.
Clearly, from the point of view of information, there are three separate things
here: the information contained in the classical angle, the information in the wave-
function, and the single quantum "bit" of information for an individual photon's
spin. Remember that from the point of view of this talk, I don't want to take the
macroscopically large for granted; therefore I would consider this experiment the
determination of the value of the angle, rather than obtaining some partial infor-
mation about some pre-existing value from the continuum of real numbers. So, a
question arises: are all space-time measurements ultimately of this type (if usually
Quantum Mechanics and Algorithmic Complexity 379
disguised because of the law of large numbers)? This is the type of question which
Penrose raised when he invented spin networks :1
Keeping the macroscopic objects a finite size has other effects. The angle is
effectively a property of the spatial relationship between the polarizer and analyzer.
A finite size for these means a cutoff in the spectrum of the angular momentum for
each object, and hence some uncertainty in the relative angle between the two due
to the quantum uncertainty principle. Thus the bit string that one gets in this case,
from writing 0 when a photon fails to pass through and 1 if it does pass through,
does not define an angle in the classical sense. What I mean is that there is not a
continuum range of values which the angle, as defined by the quantum "surveying,"
can be said with certainty to take.
Thus we see that the continuum nature of space-time, the continuum nature of
the space of quantum wavefunctions, and the usual assumption of the existence of
infinitely large and massive reference bodies, are inextricably linked. In particular,
we see that the quantum wavefunction is not just a property of the photon spin. It
is a property of the space-time measurement as well as of the photon itself.
The implications of this for understanding the concept of information, in an
algorithmic sense, in quantum theory are something one cannot ignore. Hone wants
to deal with a finite amount of information, one has to use systems of a finite size
throughout; then one cannot use continuum concepts such as a wavefunction in the
conventional sense. I feel that a satisfying resolution of this problem should also
be one that solves the puzzles I outlined earlier about the relationship of quantum
properties to classical sets.
380 J. W. Barrett
REFERENCES
1. DeWitt, B. S., and N. Graham. The Many-Worlds Interpretation of Quantum
Mechanics. Princeton: Princeton University Press, 1973.
2. Finkelstein, D. "Quantum Net Dynamics." Intl. J. Timor. Phys. (1989). To
appear.
3. Griffiths, R. B. "Correlations in Separated Quantum Systems: A Consistent
History Analysis of the EPR Problem." Am. J. Phys. 55 (1987):11-17.
4. Penrose, R. "Angular Momentum: An Approach to Combinatorial Space-
Time." In Quantum Theory and Beyond, edited by T. Bastin. London: Cam-
bridge University Press, 1971.
E. T. Jaynes
Wayman Crow Professor of Physics, Washington University, St. Louis, MO 63130
Probability in Quantum Theory
For some sixty years, it has appeared to many physicists that probability
plays a fundamentally different role in quantum theory than it does in
statistical mechanics and analysis of measurement errors. A common notion
is that probabilities calculated within a pure state have a different character
than the probabilities with which different pure states appear in a mixture
or density matrix. As Pauli put it, the former represents "...eine prinzipielle
Unbestimmtheit, nicht nur Unbekanntheit." But this viewpoint leads to so
many paradoxes and mysteries that we explore the consequences of the
unified view—all probability signifies only human information. We examine
in detail only one of the issues this raises: the reality of zero-point energy.

382 E. T. Jaynes
INTRODUCTION: HOW WE LOOK AT THINGS

In this workshop we are venturing into a smoky area of science where nobody
knows what the real truth is. Such fields are always dominated by the compensa-
tion phenomenon: supreme self-confidence takes the place of rational arguments.
Therefore, we shall try to avoid dogmatic assertions, and only point out some of
the ways in which quantum theory would appear different if we were to adopt a
different viewpoint about the meaning of probability. We think that the original
viewpoint of James Bernoulli and Laplace offers some advantages today in both
conceptual clarity and technical results for currently mysterious problems.
How we look at a theory affects our judgment as to whether it is mysterious or
irrational on the one hand, or whether it is satisfactory and reasonable on the other.
Thus, it affects the direction of our research efforts, and a fortiori their results.
Indeed, whether we theorists can ever again manage to get ahead of experiment will
depend on how we choose to look at things, because that determines the possible
forms of the future theories that will grow out of our present ones. One viewpoint
may suggest natural extensions of a theory, which cannot even be stated in terms of
another. What seems a paradox from one viewpoint may become a platitude from
another.
For example, 100 years ago, a much discussed problem was how material objects
can move through the ether without resistance. Yet, from another perspective, the
mystery disappeared without any need to dispense with the ether. One can regard
material objects not as impediments to the "flow" of ether, but as parts of the
ether ("knots" in its structure) which are propagating through it. With this way of
looking at it, there is no mystery to be explained. As a student at Princeton many
years ago, I was fascinated to learn from John Wheeler how much of physics can
be regarded as simply geometry, in this way.
Today we are beginning to realize how much of all physical science is really
only information, organized in a particular way. But we are far from unravelling
the knotty question: "To what extent does this information reside in us, and to
what extent is it a property of Nature?" Surely, almost every conceivable opinion
on this will be expressed at this workshop.
Is this variability of viewpoint something to be deplored? Eventually we should
hope to present a unified picture to the rest of the world. But for the moment, this
is neither possible nor desirable. We are all looking at the same reality and trying to
understand what it is. But we could never understand the structure of a mountain
if we looked at it only from one side. The reality that we are studying is far more
subtle than a mountain, and so it is not only desirable, but necessary that it be
examined from many different viewpoints, if we are ever to resolve the mystery of
what it is. Here we present one of those viewpoints.
First, we note a more immediate example of the effect of perspective, in order to
support, by physical arguments, later suggestions from probability considerations.
Probability in Quantum Theory 383
HOW DO WE LOOK AT GRAVITATION AND QED?

In teaching relativity theory, one may encounter a bright student who raises this
objection: "Why should such a fundamental thing as the metric of space and time
be determined only by gravitational fields—the weakest of all interactions? This
seems irrational." We explain a different way of looking at it, which makes the
irrationality disappear: "One should not think of the gravitational field as a kind
of pre-existing force which 'causes' the metric; rather, the gravitational field is
the main observable consequence of the metric. The strong interactions have not
been ignored, because the field equations show that the metric is determined by
all the energy present." According to the first viewpoint, one might think it a
pressing research problem to clear up the mystery of why the metric depends only
on gravitational forces. From the second viewpoint, this problem does not exist.
If the student is very bright, he will be back the next day with another criticism:
"If the gravitational field is only a kind of bootstrap effect of the other forces, it
raises the question whether the gravitational field should be quantized separately.
Wouldn't we be doing the same thing twice?" Thus different ways of looking at what
a gravitational field is, might lead one to pursue quite different lines of research.
A similar issue arises in electrodynamics, making a thoughtful person wonder
why we quantize the EM field. The following observations were made by Albert
Einstein, in two lectures at the Institute for Advanced Study, which I was privileged
to attend in the late 1940's.
He noted that, in contemporary quantum theory, we first develop the theory of
electrons via the Schrodinger equation, and work out its consequences for atomic
spectra and chemical bonding, with great success. Then we develop the theory of
the free quantized EM field independently, and discuss it as a separate thing. Only
at the end do we, almost as an afterthought, decide to couple them together by
introducing a phenomenological coupling constant 'e' and call the result "Quantum
Electrodynamics."
Einstein told us: "I feel that it is a delusion to think of the electrons and
the fields as two physically different, independent entities. Since neither can exist
without the other, there is only one reality to be described, which happens to have
two different aspects; and the theory ought to recognize this from the start instead
of doing things twice."
Indeed, the solution of the EM field equations is, in either classical or quantum
theory,
Ap(x) = D(x — y).10(y)cr4 y • ( 1)
J
In quantum theory Ap(x) and Jo(y) are operators, but, since the propagator
D(x — y) is a scalar function, the Ap(z) in Eq. (1) is not an operator on a "Maxwell
Hilbert Space" of a quantized EM field—it is an operator on the same space as
Jo(y), the "Dirac Hilbert Space" of the electrons.
Conventionally, one says that Eq. (1) represents only the "source field" and we
should add to this the quantized "free field" il(:)(x) which operates on the Maxwell
384 E. T. Jaynes
Hilbert space. But, fundamentally, every EM field is a source field from somewhere;
therefore, it is already an operator on the space of perhaps distant sources. So why
do we quantize it again, thereby introducing an infinite number of new degrees of
freedom for each of an infinite number of field modes?
One can hardly imagine a better way to generate infinities in physical predic-
tions than by having a mathematical formalism with (oo)2 more degrees of freedom
than are actually used by Nature. The issue is: should we quantize the matter and
fields separately, and then couple them together afterward, or should we write down
the full classical theory with both matter and field and with the field equations in
integrated form, and quantize it in a single step? The latter procedure (assuming
that we could carry it out consistently) would lead to a smaller Hilbert space.
The viewpoint we are suggesting is quite similar in spirit to the Wheeler-
Feynman electrodynamics, in which the EM field is not considered to be a "real"
physical entity in itself, but only a kind of information storage device. That is,
the present EM field is a "sufficient statistic" that summarizes all the information
about past motion of charges that is relevant for predicting their future motion.
It is not enough to reply that "The present QED procedure must be right
because it leads to several very accurate predictions: the Lamb shift, the anoma-
lous moment, etc." To sustain that argument, one would have to show that the
quantized free field actually plays an essential role in determining those accurate
numbers (1058 MHz, etc.). But their calculation appears to involve only the Feyn-
man propagators; mathematically, the propagator D(x — y) in Eq. 1 is equally well
a Green's function for the quantized or unquantized field.
The conjecture suggests itself, almost irresistibly, that those accurate experi-
mental confirmations of QED come from the local source fields, which are coherent
with the local state of matter. This has been confirmed in part by the "source-field
theory" that arose in quantum optics about 15 years ago. 1 '15'213 It was found that, at
least in lowest nonvanishing order, observable effects such as spontaneous emission
and the Lamb shift, can be regarded as arising from the source field which we had
studied already in classical EM theory, where we called it the "radiation reaction
field." Some equations illustrating this in a simpler context are given below.
In these quantum optics calculations, the quantized free field only tags along,
putting an infinite uncertainty into the initial conditions (that is, a finite uncer-
tainty into each of an infinite number of field modes) and thus giving us an infinite
"zero-point energy," but not producing any observable electrodynamic effects. One
wonders, then: Do we really need it?
HOW DO WE LOOK AT BASIC QUANTUM THEORY?

Current thinking about the role of information in science applies to all areas, and
in particular to biology, where perhaps the most valuable results will be found.
But the most tangled area in present physical science is surely the standard old
1927 vintage quantum theory, where the conceptual problems of the "Copenhagen
interpretation" refuse to go away, but are brought up for renewed discussion by
every new generation (much to the puzzlement, we suspect, of the older generation
who thought these problems were all solved). Starting with the debates between
Bohr and Einstein over sixty years ago, different ways of looking at quantum theory
persist in making some see deep mysteries and contradictions in need of resolution,
while others insist that there is no difficulty.
Defenders of the Copenhagen interpretation have displayed a supreme self-
confidence in the correctness of their position, but this has not enabled them to
give the rest of us any rational explanations of why there is no difficulty. Richard
Feynman at least had the honesty to admit, "Nobody knows how it can be that
way."
We doubters have not shown so much self-confidence; nevertheless, all these
years, it has seemed obvious to me—for the same reasons that it did to Einstein
and Schr8dinger—that the Copenhagen interpretation is a mass of contradictions
and irrationality and that, while theoretical physics can of course continue to make
progress in the mathematical details and computational techniques, there is no hope
of any further progress in our basic understanding of Nature until this conceptual
mess is cleared up.
Let me stress our motivation: if quantum theory were not successful pragmat-
ically, we would have no interest in its interpretation. It is precisely because of
the enormous success of the QM mathematical formalism that it becomes crucially
important to learn what that mathematics means. To find a rational physical in-
terpretation of the QM formalism ought to be considered the top priority research
problem of theoretical physics; until this is accomplished, all other theoretical re-
sults can only be provisional and temporary.
This conviction has affected the whole course of my career. I had intended
originally to specialize in Quantum Electrodynamics, but this proved to be impos-
sible. Whenever I look at any quantum-mechanical calculation, the basic craziness
of what we are doing rises in my gorge and I have to try to find some different way
of looking at the problem that makes physical sense. Gradually, I came to see that
the foundations of probability theory and the role of human information have to
be brought in, and so I have spent many years trying to understand them in the
greatest generality.
The failure of quantum theorists to distinguish in calculations between several
quite different meanings of "probability," between expectation values and actual
values, makes us do things that are unnecessary and fail to do things that are nec-
essary. We fail to distinguish in our verbiage between prediction and measurement.
For example, two famous vague phrases—"It is impossible to specify.. ." and "It is
impossible to define.. ."—can be interpreted equally well as statements about pre-
diction or about measurement. Thus, the demonstrably correct statement that the
present theory cannot predict something becomes twisted into the almost certainly
false claim that the experimentalist cannot measure it!
We routinely commit the Mind Projection Fallacy of projecting our own
thoughts out onto Nature, supposing that creations of our own imagination are real
386 E. T. Jaynes
properties of Nature, or our own ignorance signifies some indecision on the part of
Nature. This muddying up of the distinction between reality and our knowledge of
reality is carried to the point where we find some asserting the objective reality of
probabilities, while denying the objective reality of atoms! These sloppy habits of
language have tricked us into mystical, pre-scientific standards of logic, and leave
the meaning of any QM result simply undefined. Yet we have managed to learn how
to calculate with enough art and tact so that we come out with the right numbers!
The main suggestion we wish to make is that how we look at basic probability
theory has deep implications for the Bohr-Einstein positions. Only within the past
year has it appeared to the writer that we might be able finally to resolve these
matters in the happiest way imaginable: a reconciliation of the views of Bohr and
Einstein in which we can see that they were both right in the essentials, but just
thinking on different levels.
Einstein's thinking is always on the ontological level traditional in physics,
trying to describe the realities of Nature. Bohr's thinking is always on the episte-
mological level, describing not reality but only our information about reality. The
peculiar flavor of his language arises from the absence of words with any ontolog-
ical import; the notion of a "real physical situation" was just not present and he
gave evasive answers to questions of the form: "What is really happening?" Eugene
Wigner24 was acutely aware of and disturbed by this evasiveness when he remarked:
These Copenhagen people are so clever in their use of language that, even
after they have answered your question, you still don't know whether the
answer was "yes" or "no"!
J. R. Oppenheimer, more friendly to the Copenhagen viewpoint, tried to explain
it in his lectures in Berkeley in the 1946-47 school year. Oppy anticipated multiple-
valued logic when he told us:
Consider an electron in the ground state of the hydrogen atom. If you ask,
"Is it moving?," the answer is "no." If you ask, "Is it standing still?," the
answer is "no."
Those who, like Einstein (and, up till recently, the present writer) tried to
read ontological meaning into Bohr's statements, were quite unable to understand
his message. This applies not only to his critics but equally to his disciples, who
undoubtedly embarrassed Bohr considerably by offering such exegeses as, "Instan-
taneous quantum jumps are real physical events," or "The variable is created by
the act of measurement," or the remark of Pauli quoted above, which might be
rendered loosely as, "Not only are you are I ignorant of x and p; Nature herself
does not know what they are."
Critics who tried to summarize Bohr's position sarcastically as, "If I can't
measure it, then it doesn't exist!," were perhaps closer in some ways to his actual
thinking than were his disciples. Of course, while Bohr studiously avoided all as-
sertions of "reality," he did not carry this to the point of denying reality; he was
merely silent on the issue, and would prefer to say, simply: "If we can't measure it,
then we can't use it for prediction."
Although Bohr's whole way of thinking was very different from Einstein's,
it does not follow that either was wrong. In the writer's view, all of Einstein's
thinking—in particular the EPR argument—remains valid today, when we take
into account its ontological character. But today, when we are beginning to con-
sider the role of information for science in general, it may be useful to note that
we are finally taking a step in the epistemological direction that Bohr was trying
to point out sixty years ago.
This statement applies only to the general philosophical position that the role
of human information in science needs to be recognized and taken into account
explicitly. Of course, it does not mean that every technical detail of Bohr's work is
to remain unchanged for all time. Our present QM formalism is a peculiar mixture
describing in part laws of Nature and in part incomplete human information about
Nature—all scrambled up together by Bohr into an omelette that nobody has seen
how to unscramble. Yet we think that the unscrambling is a prerequisite for any
further advance in basic physical theory and we want to speculate on the proper
tools to do this.
We suggest that the proper tool for incorporating human information into sci-
ence is simply probability theory—not the currently taught "random variable" kind,
but the original "logical inference" kind of James Bernoulli and Laplace. For histori-
cal reasons explained elsewhere,11 this is often called "Bayesian probability theory."
When supplemented by the notion of information entropy, this becomes a mathe-
matical tool for scientific reasoning of such power and versatility that we think it
will require a century to explore its capabilities. But the preliminary development
of this tool and testing it on simple problems is now fairly well in hand, as described
below.
A job for the immediate future is to see whether, by proper choice of variables,
Bohr's omelette can be seen as a kind of approximation to it. In the 1950's, Richard
Feynman noted that some of the probabilities in quantum theory obey different
rules (interference of path amplitudes) than do the classical probabilities. But more
recently12 we have found that the QM probabilities involved in the EPR scenario are
striking similar to the Bayesian probabilities, often identical; we interpret Bohr's
reply to EPR as a recognition of this. That is, Bohr's explanation of the EPR
experiment is a fairly good statement of Bayesian inference. Therefore, the omelette
does have some discernible structure of the kind that we would need in order to
unscramble it.
PROBABILITY AS THE LOGIC OF SCIENCE

For some 200 years a debate has been underway on the philosophical level, over
this issue: Is probability theory a "physical" theory of phenomena governed by
388 E. T. Jaynes
"chance" or "randomness," or is it an extension of logic, showing how to reason

in situations of incomplete information? For two generations, the former view has
dominated science almost completely.
More specifically, the basic equations of probability theory are the product and
sum rules: denoting by AB the proposition that "A and B are both true" and by
A that the proposition "A is false," these are
P(ABIC) = P(AIBC)P(BIC) = P(BIAC)P(AIC) (2)

P(AIB) -1- P(AIB) = 1, (3)
and the issue is: What do these equations mean? Are they rules for calculating fre-
quencies of "random variables" or rules for conducting plausible inference (reason-
ing from incomplete information)? Does the conditional probability symbol P(AIB)
stand for the frequency with which A is true in some "random experiment" defined
by B; or for the degree of plausibility, in a single instance, that A is true, given
that B is true? Do probabilities describe real properties of Nature or only human
information about Nature?
The original view of James Bernoulli and Laplace was that probability theory is
an extension of logic to the case where, because of incomplete information, deductive
reasoning by the Aristotelian syllogisms is not available. It was sometimes called
"the calculus of inductive reasoning." All of Laplace's great contributions to science
were made with the help of probability theory interpreted in this way.
But, starting in the mid-19th Century, Laplace's viewpoint came under violent
attack from Leslie Ellis, John Venn, George Boole, R. von Mises, R. A. Fisher, M.
G. Kendall, W. Feller, J. Neyman, and others. Their objection was to his philos-
ophy; none of these critics was able to show that Laplace's methods [application
of Eqs. (2) and (3) as a form of logic] contained any inconsistency or led to any
unsatisfactory results. Whenever they seemed to find such a case, closer examina-
tion always showed that they had only misunderstood and misapplied Laplace's
methods.
Nevertheless, this school of thought was so aggressive that it has dominated
the field almost totally, so that virtually all probability textbooks in current use
are written from a viewpoint which rejects Laplace's interpretations and tries to
deny us the use of his methods. Almost the only exceptions are found in the works
of Harold Jeffreys18 and Arnold Zellner,25 which recognize the merit of Laplace's
viewpoint and apply it, with the same kind of good results that Laplace found, in
more sophisticated current problems. We have written two short histories of these
matters,8,11 engaged in a polemical debate on them,8 and are trying to finish a two-
volume treatise on the subject, entitled Probability Theory—The Logic of Science.
Denunciations of the "subjectivity" of Laplace, Jeffreys, and the writer for using
probability to represent human information, and even more of the "subjectivity"
of entropy based on such probabilities, often reach hysterical proportions; it is very
hard to understand why so much emotional fervor should be aroused by these
questions. Those who engage in these attacks are only making a public display of
their own ignorance; it is apparent that their tactics amount to mere chanting of
ideological slogans, while simply ignoring the relevant, demonstrable technical facts.
But the failure of our critics to find inconsistencies does not prove that our
methods have any positive value for science. Are there any new useful results to be
had from using probability theory as logic? Some are reported in the proceedings
volumes of the Annual (since 1981) MAXENT workshops, particularly the one
in Cambridge, England in August 198812 wherein a generalized Second Law of
Thermodynamics is used in what we think is the first quantitative application of
the second law in biology. But, unfortunately, most of the problems solvable by
pencil-and-paper methods were too trivial to put this issue to a real test; although
the results never conflicted with common sense, neither did they extend it very
far beyond what common sense could see or what "random variable" probability
theory could also derive.
Only recently, thanks to the computer, has it become feasible to solve real,
nontrivial problems of reasoning from incomplete information, in which we use
probability theory as a form of logic in situations where both intuition and "random
variable" probability theory would be helpless. This has brought out the facts in a
way that can no longer be obscured by arguments over philosophy. It is not easy
to argue with a computer printout, which says to us: "Independently of all your
philosophy, here are the facts about what this method actually gives when applied."
The "MAXENT" program developed by John Skilling, Steve Gull, and their
colleagues at Cambridge University, England can maximize entropy numerically in
a space of 1,000,000 dimensions, subject to 2,000 simultaneous constraints. The
"Bayesian" data-analysis program developed by G. L. Bretthorst2 at Washington
University, St. Louis, can eliminate a hundred uninteresting parameters and give
the simultaneous best estimates of twenty interesting ones and their accuracy, or it
can take into account all the parameters in a set of possible theories or "models"
and give us the relative probabilities of the theories in the light of the data. It was
interesting, although to us not surprising, to find that this leads automatically to
a quantitative statement of Occam's Razor: prefer the simpler theory unless the
other gives a significantly better fit to the data.
Many computer printouts have now been made at Cambridge University, of
image reconstructions in optics and radio astronomy, and at Washington University
in analysis of economic, geophysical, and nuclear magnetic resonance data. The
results were astonishing to all of us; they could never have been found, or guessed,
by hand methods.
In particular, the Bretthorst programs3,4,5 extract far more information from
NMR data (where the ideal sinusoidal signals are corrupted by decay) than could
the previously used Fourier transform methods. No longer does decay broaden the
spectrum and obscure the information about oscillation frequencies; the result is
an order-of-magnitude-better resolution.
Less spectacular numerically, but equally important in principle, they yield fun-
damental improvements in extracting information from economic time series when
the data are corrupted by trend and seasonality; no longer do these obscure the
information that we are trying to extract from the data. Conventional "random
390 E. T. Jaynes
variable" probability theory lacks the technical means to eliminate nuisance pa-
rameters in this way, because it lacks the concept of "probability of a hypothesis."
In other words, there is no need to shout: it is now a very well-demonstrated
fact that, after all criticisms of its underlying philosophy, probability theory inter-
preted and used as the logic of human inference does rather well in dealing with
problems of scientific reasoning—just as James Bernoulli and Laplace thought it
would, back in the 18th Century.
Our probabilities and the entropies based on them are indeed "subjective" in
the sense that they represent human information; if they did not, they could not
serve their purpose. But they are completely "objective" in the sense that they
are determined by the information specified, independently of anyone's personality,
opinions, or hopes. It is "objectivity" in this sense that we need if information is
ever to be a sound basis for new theoretical developments in science.
HOW WOULD QUANTUM THEORY BE DIFFERENT?

The aforementioned successful applications of probability theory as logic were con-
cerned with data processing, while the original maximum entropy applications were
in statistical mechanics, where they reproduced in a few lines, and then general-
ized the results of Gibbs. In these applications, probability theory represented the
process of reasoning from incomplete information. There is no claim that its pre-
dictions must be "right"; only that they are the best that can be made from the
information we have. [That is, after all, the most that any science can pretend to
do; yet some complain when cherished illusions are replaced by honest recognition
of the facts].
We would like to see quantum theory in a similar way; since a pure state does
not contain enough information to predict all experimental results, we would like
to see QM as the process of making the best predictions possible from the partial
information that we have when we know 0. If we could either succeed in this, or
prove that it is impossible, we would know far more about the basis of our present
theory and about future possibilities for acquiring more information than we do
today.
Einstein wanted to do something very similar, but he offered only criticisms
rather than constructive suggestions. What undoubtedly deterred both Einstein and
Schrodinger is this: one sees quickly that the situation is more subtle than merely
keeping the old mathematics and reinterpreting it. That is, we cannot merely pro-
claim that all the probabilities calculated within a QM pure state 1b according to
the standard rules of our textbooks are now to be interpreted as expressions of hu-
man ignorance of the true physical state. The results depend on the representation
in a way that makes this naive approach impossible.
For example, if we expand ek in the energy representation lb = E an(t)u„(x),
the physical situation cannot be described merely as "the system may be in state
ul(x) with probability p1 = lai12 ; or it may be in state 122(x) with probability

p2 = 1a212, and we do not know which of these is the true state." This would suffice
to give, using classical probability theory, the QM predictions of quantities that are
diagonal in the {un } representation, but the relative phases of the amplitudes an
have a definite physical meaning that would be lost by that approach.
Even though they have no effect on probabilities pn in the energy representa-
tion, these phases will have a large effect on probabilities in some other represen-
tation. They affect the predicted values of quantities that are not diagonal in the
{un } representation, in a way that is necessary for agreement with experiment. For
example, the relative phases of degenerate energy states of an atom determine the
polarization of its resonance radiation, which is an experimental fact; so there has
to be something physically real in them.
In other words, we cannot say merely that the atom is "in" state ui or "in"
state 112 as if they were mutually exclusive possibilities and it is only we who are
ignorant of which is the true one; in some sense, it must be in both simultaneously
or, as Pauli would say, the atom itself does not know what energy state it is in.
This is the conceptually disturbing, but experimentally required, function of the
superposition principle.
But notice that there is nothing conceptually disturbing in the statement that
a vibrating bell is in a linear combination of two vibration modes with a definite
relative phase; we just interpret the mode (amplitudes)2 as energies, not probabil-
ities. So it is the way we look at quantum theory, trying to interpret its symbols
directly as probabilities, that is causing the difficulty.
If this seems at first to be an obstacle to our purpose, it is also our real op-
portunity, because it shows that the probabilities which we seek and which express
the incompleteness of the information in a pure state in terms of a set of mutually
exclusive possibilities (call it an "ensemble" if you like) cannot be the usual things
called "probability" in the QM textbooks. The human information must be rep-
resented in a deeper "hypothesis space" which contains the phases as well as the
amplitudes.
To realize this is to throw off a whole legacy of supposed difficulties from the
past; the nonclassical behavior of QM probabilities pointed out by Feynman ceases
to bother us because the quantities exhibiting that behavior will not be interpreted
as probabilities in the new hypothesis space. Likewise, the Bell inequality arguments
are seen to have very little relevance to our problem, for he was hung up on the
difficulty of getting the standard QM probabilities out of a causal theory. But if they
are not the basic probabilities after all, the failure of a causal theory to reproduce
them as probabilities might seem rather a merit than a defect. So the clouds begin
to lift, just a bit.
This is not an auspicious time to be making public announcements of startling,
revolutionary new scientific discoveries; so it is rather a relief that we have none to
announce. To exhibit the variables of that deeper hypothesis space explicitly is a
job for the future; in the meantime we can do a little job of housecleaning that is, in
any event, a prerequisite for it. We cannot hope to get our probability connections
right until we get some basic points of physics right.
392 E. T. Jaynes
The first difficulty we encounter upon any suggestion that probabilities in quan-
tum theory might represent human information is the barrage of criticism from
those who believe that dispersions (LF)2 = (F2) — (F)2 represent experimentally
observable "quantum fluctuations" in F. Some even claim that these fluctuations
are real physical events that take place constantly whether or not any measurement
is being made (although, of course, that does violence to Bohr's position). At the
1966 Rochester Coherence Conference, Roy Glauber assured us that vacuum fluc-
tuations are "very real things" and that any attempts to dispense with EM field
quantization are therefore doomed to failure. It can be reported that he was widely
and enthusiastically believed.
Now in basic probability theory, OF represents fundamentally the accuracy
with which we are able to predict the value of F. This does not deny that it may
also be the variability seen in repeated measurements of F, but the point is that
they need not be the same. To suppose that they must be the same is to commit an
egregious form of the Mind Projection Fallacy; the fact that our information is able
to determine F only to five percent accuracy, is not enough to make it fluctuate by
five percent! However, it is almost right to say that, given such information, any
observed fluctuations are unlikely to be greater than five percent.
Let us analyze in depth the single example of EM field fluctuations, and show
that (1) the experimental facts do not require vacuum fluctuations to be real events
after all; (2) Bayesian probability at this point is not only consistent with the ex-
perimental facts, it offers us some striking advantages in clearing up past difficulties
that have worried generations of physicists.
IS ZERO-POINT ENERGY REAL?

For many years we have had a strange situation; on the one hand, "Official QED"
has never taken the infinite ZP energy question seriously, apparently considering
it only a formal detail like the infinite charge density in the original hole theory,
which went away when the charge symmetry of the theory was made manifest in
Schwinger's action principle formulation.
But the ZP problem has not gone away; on the other hand, as we have noted,
there is a widespread belief that ZP fluctuations are real and necessary to ac-
count for all kinds of things, such as spontaneous emission, the Lamb shift, and
the Casimir attraction effect.6 Steven Weinberg21 accepted the Casimir effect as
demonstrating the-reality of ZP energy, and worried about it in connection with
cosmology. We know that Pauli also worried about this and did some calculations,
but apparently never published them.
If one takes the ZP energy literally, one of the disturbing consequences is the
gravitational field that it would produce. For example, if there is a ZP energy
density Wzp in space, the Kepler ratio for a planet of mean distance R from the
sun would be changed to
R3 G r 47R3
Wzp (4)
= [Mi un 3C2 ,
Numerical analysis of this shows that, in order to avoid conflict with the observed
Kepler ratios of the outer planets, the upper frequency cutoff for the ZP energy
would have to be taken no higher than optical frequencies.
But attempts to account for the Lamb shift by ZP fluctuations would require
a cutoff thousands of times higher, at the Compton wavelength. The gravitational
field from that energy density would not just perturb the Kepler ratio; it would
completely disrupt the solar system as we know it.
The difficulty would disappear if one could show that the aforementioned ef-
fects have a different cause, and ZP field energy is not needed to account for any
experimental facts. Let us try first with the simplest effect, spontaneous emission.
The hypothesized zero-point energy density in a frequency band Aw is
Wzp = pzp(w)Ace = ( 2h4.7) (4-

72.227
3 ) ergs/cm3 • (5)
Then an atom decaying at a rate determined by the Einstein A-coefficient
4p2413
A= (6)
3h1
where p is the dipole moment matrix element for the transition, sees this over an
effective bandwidth
Aco =
f
1(40(k,, wA
=
I(coo) 2 (7)
where .1(w) is the Lorentzian spectral density
1
/(w) cc (w — 44)02 + (A/2)2 (8)
The effective energy density in one field component, say Ez , is then
AU, = ffip
2(u.) 6
T ergs/cm3
1
Wzp eff = pzp (4.7 )
(9)
and it seems curious that Planck's constant has cancelled out. This indicates the
magnitude of the electric field that a radiating atom sees according to the ZP theory.
On the other hand, the classical radiation reaction field generated by a dipole
of moment p:
2 d3p 2w3
ERR = = LA (10)
3c3 dt3 3c3
394 E. T. Jaynes
has energy density
ERR
2 2 r 6
WRR - p —) ergs/cm3
8
r 18r c
But Eqs. (9) and (11) are identical! A radiating atom is indeed interacting with an
electric field of just the magnitude predicted by the zero-point calculation, but this
is the atom's own radiation reaction field.
Now we can see that this needed field is generated by the radiating atom,
automatically but in a more economical way; only where it is needed, when it is
needed, and in the frequency band needed. Spontaneous emission does not require
an infinite energy density throughout all space. Surely, this is a potentially far more
satisfactory way of looking at the mechanism of spontaneous emission (if we can
clear up some details about the dynamics of the process).
But then someone will point immediately to the Lamb shift; does this not
prove the reality of the ZP energy? Indeed, Schwinger17,18 and Weisskopf 22 stated
explicitly that ZP field fluctuations are the physical cause of the Lamb shift, and
Welton23 gave an elementary "classical" derivation of the effect from this premise.
Even Niels Bohr concurred. To the best of our knowledge, the closest he ever
came to making an ontological statement was uttered while perhaps thrown mo-
mentarily off guard under the influence of Schwinger's famous eight-hour lecture at
the 1948 Pocono conference. As recorded in John Wheeler's notes on that meeting,
Bohr says: "It was a mistake in the older days to be discontented with field and
charge fluctuations. They are necessary for the physical interpretation."
In 1953 Dyson7 also concurred, picturing the quantized field as something akin
to hydrodynamic flow with superposed random turbulence, and he wrote: "The
Lamb-Retherford experiment is the strongest evidence we have for believing that
our picture of the quantum field is correct in detail." Then in 1961 Feynman sug-
gested that it should be possible to calculate the Lamb shift from the change in
total ZP energy in space due to the presence of a hydrogen atom in the 2s state;
and in 1966 E. A. Power16 gave the calculation demonstrating this in detail. How
can we possibly resist such a weight of authority and factual evidence?
As it turns out, quite easily. The problem has been that these calculations
have been done heretofore only in a quantum field theory context. Because of this,
people jumped to the conclusion that they were quantum effects (i.e., effects of
field quantization), without taking the trouble to check whether they were present
also in classical theory. As a result, two generations of physicists have regarded the
Lamb shift as a deep, mysterious quantum effect that ordinary people cannot hope
to understand. So we are facing not so much a weight of authority and facts as a
mass of accumulated folklore.
Since our aim now is only to explain the elementary physics of the situation
rather than to give a full formal calculation, let us show that this radiative fre-
quency shift effect was present already in classical theory, and that its cause lies
simply in properties of the source field (Eq. (1)), having nothing to do with field
fluctuations. In fact, by stating the problem in Hamiltonian form, we can solve
it without committing ourselves to electromagnetic or acoustical fields. Thus the

vibrations of a plucked guitar string are also damped and shifted by their coupling
to the acoustical radiation field, according to the following equations.
THE LAMB SHIFT IN CLASSICAL MECHANICS

Let there be n "field oscillators" with coordinates and momenta fqi(t),A(t)}, and
one "Extra Oscillator" {Q(t), P(t)), a caricature of a decaying atom or plucked
string; call it "the EO." It is coupled linearly to the field oscillators, leading to a
total Hamiltonian
H =—
1
2
(p7
1
wiq?)+ (1'2 + S22Q2) — E aiqiQ • (12)
The physical effects of coupling the EO to the field variables may be calculated in
two "complementary" ways;
(I) Dynamic: how are the EO oscillations modified by the field coupling?
(H) Static: what is the new distribution of normal mode frequencies?
The new normal modes are the roots {r/;} of the equation n2 — v2 = K(v),
where K(v) is the dispersion function
2 /co
K(v) E 2a` 2 — Jo
CJ• — v
=
K(i)e-ndt, $ = iv . (13)
Let us solve the problem first in the more familiar dynamical way. With initially
quiescent field modes qi(0) = 4i(0) = 0, the decay of the extra oscillator is found
to obey a Volterra equation:
+112Q(t)= rt K(t — t')Q(t')dt' . (14)

0
Thus K(t) is a memory function and the integral in Eq. (14) is a source field. For
arbitrary initial EO conditions Q(0), Q(0), the solution is
Q(t) = Q(0)o(t)+ Q(0)G(t) (15)
with the Green's function
e'y= dv
G(t)= 3;
1 .1.70 122 — p2 K(v) (16)
396 E. T. Jaynes
where the contour goes under the poles on the real axis. This is the exact decay
solution for arbitrary field mode patterns.
In the limit of many field modes, this goes into a simpler form. There is a mode
density function po(w):
( ) *-- jam( )Po(u))dw • (17)
Then from Eq. (13), K(v) goes into a slowly varying function on the path of inte-
gration Eq. (16):
a2 (w)Po(w) _, 2v [A(v) A- ir(v)i

K(v — ie) —+ Jo 40 2 _ (v — ie) (18)
and neglecting some small terms, the resulting Green's function goes into
sin(f2 + A)t
G(t) _, exp( rt) (19)
(f2 + A)
where
r(o) = ra2(0)po(n)
4112
(20)
(1"‘ 1 D 1.°3 cr2 (w)po(ca)du, 1 p ic° r(w)dw
A(4 2114 jo f22 —w2 = 7 Loc, f2 —4)
(21)
are the "spontaneous emission rate" and "radiative frequency shift" exhibited by
the EO due to its coupling to the field modes. We note that A(11) and r(w) form
a Hilbert transform pair (a Kramers-Kronig-type dispersion relation expressing
causality). In this approximation, Eq. (15) becomes the standard exponentially
damped solution of a linear differential equation with loss: Q + 21V+ (f2+A)2 Q =
0.
As a check, it is a simple homework problem to compare our damping factor r
with the well-known Larmor radiation law, by inserting into the above formulas the
free-space mode density function po(w) = Vw2/72c3, and the coupling coefficients
ai. appropriate to an electric dipole of moment p proportional to Q. We then find
f 4=42 \ I vw2 N 2
A2 ii2(4' 1
r(w) ---- (4:2) kw •• 3v ) k724.3) = 3Q2c3 sec (22)
and it is easily seen that for the average energy loss over a cycle this agrees exactly
with the Larmor formula
2c0 . ,
Prad = 3 kx? (23)
for radiation from an accelerated particle. In turn, the correspondence between the
Larmor radiation rate and the Einstein A-coefficient (6) is well-known textbook
material.
It is clear from this derivation that the spontaneous emission and the radiative
frequency shift do not require field fluctuations, since we started with the explicit
initial condition of a quiescent field: qi = qi = 0. The damping and shifting are due
entirely to the source field reacting back on the source, as expressed by the integral
in Eq. (14).
Of course, although the frequency shift formula (21) resembles the "Bethe loga-
rithm" expression for the Lamb shift, we cannot compare them directly because our
model is not a hydrogen atom; we have no s-states and p-states. But if we use values
of of and SZ for an electron oscillating at optical frequencies and use a cutoff corre-
sponding to the size of the hydrogen atom, we get shifts of the order of magnitude
of the Lamb shift. A more elaborate calculation will be reported elsewhere.
But now this seems to raise another mystery; if field fluctuations are not the
cause of the Lamb shift, then why did the aforementioned Welton and Power calcu-
lations succeed by invoking those fluctuations? We face here a very deep question
about the meaning of "fluctuation-dissipation theorems." There is a curious math-
ematical isomorphism; throughout this century, starting with Einstein's relation
between diffusion coefficient and mobility D = 6x2/2t = kTu and the Nyquist
thermal noise formula for a resistor 6V2 = 4kTRAf,, theoreticians have been deriv-
ing a steady stream of relations connecting "stochastic" problems with dynamical
problems.
Indeed, for every differential equation with a non-negative Green's function,
there is an obvious stochastic problem which would have the same mathematical
solution even though the problems are quite unrelated physically, but as Mark
Kac14 showed, the mathematical correspondence between stochastic and dynamical
problems is much deeper and more general than that.
These relations do not prove that the fluctuations are real; they show only that
certain dissipative effects (i.e., disappearance of the extra oscillator energy into the
field modes) are the same as if fluctuations were present. But then by the Hilbert
transform connection noted, the corresponding reactive effects must also be the
same as if fluctuations were present; the calculation of Welton23 shows how this
comes about.
But this still leaves a mystery surrounding the Feynman-Power calculation,
which obtains the Lamb shift from the change in total ZP energy in the space
surrounding the hydrogen atom; let us explain how that can be.
CLASSICAL SUBTRACTION PHYSICS

Consider now the second, static method of calculating the effect of field coupling.
One of the effects of the EO is to change the distribution of normal modes; the
above "free space" mode density po(w) is incremented to
P(w) = po(w)+Pi(w) • (24)

398 E. T. Jaynes
To calculate the mode density increment, we need to evaluate the limiting form of
the dispersion function K(v) more carefully than in Eq. (18).
From the Hamiltonian (12), the normal modes are the roots {14} of the disper-
sion equation
S22 _ v2 = K(v) = \••••• 2ai 2 • (25)
ts wi — v
K(v) resembles a tangent function, having poles at the free field mode frequencies wi
and zeroes close to midway between them. Suppose that the unperturbed frequency
S2 of the EO lies in the cell (wi < ft < coi+i). Then the field modes above it are
raised by amounts 6vk = vk —wk , k = i+1, i+2, • - - n. The field modes below it are
lowered by Ovk = vk-1 — wk, k = 1,2, • • - i; and one new normal mode vi appears
in the same cell as St: (wi < vi < wi+1). The separation property (exactly one new
mode vk lies between any two adjacent old modes 44) places a stringent limitation
on the magnitude of any static mode shift byk.
Thus the original field modes wi are, so to speak, pushed aside by a kind of
repulsion from the added frequency St, and one new mode is inserted into the gap
thus created. If there are many field modes, the result is a slight increase pi(v) in
mode density in the vicinity of Q. To calculate it, note that if the field mode wi is
shifted a very small amount to vk = (di + by, and by varies with 4.7i, then the mode
density is changed to
P(w) = PoM+ Pi(w)= Po(w)[1— 00 + • • - 1 • (26)
In the continuum limit, pc, —4 oo and by —0. 0; however, the increment pi(w) remains
finite and as we shall see, loaded with physical meaning.
We now approximate the dispersion function K(v) more carefully. In Eq. (16)
where Im(v) < 0, we could approximate it merely by the integral, since the local
behavior (the infinitely fine-grained variation in K(v) from one pole to the next)
cancels out in the limit at any finite distance from the real axis. But now we need
it exactly on the real axis, and those fine-grained local variations are essential,
because they provide the separation property that limits the static mode shifts by.
Consider the case where wi > St and v lies in the cell (wi < v < coi+i). Then
the modes are pushed up. If the old modes near v are about uniformly spaced, we
have for small n, wi+n ce. wi + nl Po(co); therefore
2 2 2v (27)
wil-n — V -= — (n — POO ,
Po
and the sum of terms with poles near v goes into
E 2v'
s+n
(n —
Pio) --
pobv) —
)
ra2( Pp(v) COt[7p0(08d
2v
(28)
where we supposed the eri slowly varying and recognized the Mittag-Leffler expan-
sion 7 cot vx = E(x — n)-1. The contribution of poles far from v can again be
represented by an integral. Thus, on the real axis, the dispersion function goes, in
the continuum limit, into
7 2 /30 a2 (w)Po(w)c140
K(v) a cot[ir po(v)bv] + P w 2 v2 '
2v
But in this we recognize our expressions (20) and (21) for r and A:
K(v) :-_.• —2Q [A + r cot(irpobv)] . (29)
As a check, note that if we continue 6v below the real axis, the cotangent goes
into cot( —ix) -4 +i, and we recover the previous result (18). Thus if we again
assume a sharp resonance (S2 v) and write the dynamically shifted frequency as
cva = 52 + A, the dispersion relation (25) becomes a formula for the static mode
shift 5v:
po(v)bv = tan-1 ( r (30)
v — coo
and (26) then yields for the increment in mode density a Lorentzian function:
1 Lb/
Pi(v)dv = .... woz rz • (31)
This is the spectrum of a damped oscillation:

Ico
pi (v)ei" = eiwot e-r1t1 (32)
with the same shift and width as we found in the dynamical calculation (14).
As a check, note that the increment is normalized, f pidv = 1 as it should be,
since the "macroscopic" effect of the coupled EO is just to add more new mode to
the system. Note also that the result (31) depended on K(v) going locally into a
tangent function. If for any reason (i.e., highly nonuniform mode spacing or coupling
constants, even in the limit) K(v) does not go into a tangent function, we will not
get a Lorentzian p1(v). This would signify perturbing objects in the field, or cavity
walls that do not recede to infinity in the limit, so echoes from them remain.
But the connection (32) between the mode density increment and the decay
law is quite general. It does not depend on the Lorentzian form of pi(v), on the
particular equation of motion for Q, on whether we have one or many resonances
Q, or indeed on any property of the perturbing EO other than the linearity of its
response.
To see this, imagine that all normal modes are shock excited simultaneously
with arbitrary amplitudes A(v). Then the response is a superposition of all modes:
AMEPo(v) + (v)ie j"clu - (33)

J
400 E. T. Jaynes
But since the first integral represents the response of the free field, the second must
represent the "ringing" of whatever perturbing objects are present. If A(v) is nearly
constant in the small bandwidth occupied by a narrow peak in pi(v), the resonant
ringing goes into the form (32).
Therefore, every detail of the transient decay of the dynamical problem is, so
to speak, "frozen into" the static mode density increment function pi (v) and can
be extracted by taking the Fourier transform (32). Thus a bell, excited by a pulse
of sound, will ring out at each of its resonant frequencies, each separate resonance
having a decay rate and radiative frequency shift determined by pi (v) in the vicinity
of that resonance.
Then a hydrogen atom in the 2s state, excited by a sharp electromagnetic pulse,
will "ring out" at the frequencies of all the absorption or emission lines that start
from the 2s state, and information about all the rates of decay and all the radiative
line shifts, is contained in the pi(v) perturbation that the presence of that atom
makes in the field-mode density.
Thus Feynman's conjecture about the relation between the Lamb shift and the
change in ZP energy of the field around that atom, is now seen to correspond to a
perfectly general relation that was present all the time in classical electromagnetic
and acoustical theory, and might have been found by Rayleigh, Helmholtz, Maxwell,
Larmor, Lorentz, or Poincare in the last century.
It remains to finish the Power-type calculation and show that simple classical
calculations can also be done by the more glamorous quantum mechanical methods
of "subtraction physics" if one wishes to do so. Suppose we put the extra oscillator
in place and then turn on its coupling to the field oscillators. Before the coupling is
turned on, we have a background mode density po(w) with a single sharp resonance,
mode density 8(w — n) superimposed. Turning on the coupling spreads this out into
pi(w), superimposed on the same background, and shifts its center frequency by
just the radiative shift A. In view of the normalization of pi(w), we can write
00
A= wpi (w)dco — St . (34)
0
Suppose, then, that we had asked a different question: "What is the total frequency
shift in all modes due to the coupling?" Before the coupling is turned on, the total
frequency is a badly divergent expression:
f oo
(00)i = 12 + J Po(w)dAa (35)
o
and afterward it is
j w Epo(w) + P1(w)lelho
(00)2 = o°3 (36)
which is no better. But then the total change in all mode frequencies due to the
coupling is, from Eq. (34):
(00)2 — (00)1 = A.
A (37)
To do our physics by subtraction of infinities is an _awkward way of asking the line-

shift question, but it leads to the same result. There is no longer much mystery
about why Power could calculate the radiative shift in the dynamical problem by
the change in total ZP energy; actually, he calculated the change in total frequency
of all modes, which was equal to the dynamical shift even in classical mechanics.
But some will still hold out and point to the Casimir attraction effect, where
one measures a definite force which is held to arise from the change in total ZP
energy when one changes the separation of two parallel metal plates. How could we
account for this if the ZP energy is not real? This problem is already discussed in
the literature; Schwinger, de Raad, and Milton19 derive it from Schwinger's source
theory, in which there are no operator fields. One sees the effect, like the van der
Waals attraction, as arising from correlations in the state of electrons in the two
plates, through the intermediary of their source fields (1). It does not require ZP
energy to reside throughout all space, any more than does the van der Waals force.
Thus we need not worry about the effect of ZP energy on the Kepler ratio (4) or
the cosmological constant, after all.
CONCLUSION
We have explored only a small part of the issues that we have raised; however, it is
the part that has seemed the greatest obstacle to a unified treatment of probability
in quantum theory. Its resolution was just a matter of getting our physics straight;
we have been fooled by a subtle mathematical correspondence between stochastic
and dynamical phenomena, into a belief that the "objective reality" of vacuum
fluctuations and ZP energy are experimental facts. With the realization that this
is not the case, many puzzling difficulties disappear.
We then see the possibility of a future quantum theory in which the role of in-
complete information is recognized: the dispersion (AF)2 = (F2) — (F)2 represents
fundamentally only the accuracy with which the theory is able to predict the value
of F. This may or may not be also the variability in the measured values.
In particular, when we free ourselves from the delusion that probabilities are
physically real things, then when OF is infinite, that does not mean that any
physical quantity is infinite. It means only that the theory is completely unable to
predict F. The only thing that is infinite is the uncertainty of the prediction. In
our view, this represents the beginning of a far more satisfactory way of looking
at quantum theory, in which the important research problems will appear entirely
different than they do now.
402 E. T. Jaynes
REFERENCES
1. Allen, L., and J. H. Eberly. Optical Resonance and Two-Level Atoms, chap. 7.
New York: J. Wiley and Sons, 1975.
2. Bretthorst, G. L. "Bayesian Spectrum Analysis and Parameter Estimation."
Springer Lecture Notes in Statistics 48 (1988).
3. Bretthorst, G. L., C. Hung, D. A. D'Avegnon, and J. H. Ackerman. "Bayesian
Analysis of Time-Domain Magnetic Resonance Signals." J. Mag. Res. 79
(1988):369-376.
4. Bretthorst, G. L., J. J. Kotyk, and J. H. Ackerman. "31P NMR Bayesian
Spectral Analysis of Rat Brain in Vivo." Mag. Res. in Medicine 9 (1989):282-
287.
5. Bretthorst, G. L., and C. Ray Smith. "Bayesian Analysis of Signals from
Closely Spaced Objects." In Infrared Systems and Components III, edited by
Robert L. Caswell, vol. 1050. San Francisco: SPIE, 1989, 93-104.
6. Casimir, H. G. B. Proc. K. Ned. Akad. Wet. 51 (1948):635.
7. Dyson, F. J. "Field Theory." Sci. Am. (April 1953):57.
8. Jaynes, E. T. "Confidence Invervals vs. Bayesian Intervals." In Foundations
of Probability Theory, Statistical Inference, and Statistical Theories of Sci-
ence, edited by W. L. Harper and C. A. Hooker. Dordrecht-Holland: D. Rei-
del Pub. Co., 1976; reprinted in part in Ref. 10.
9. Jaynes, E. T. "Where Do We Stand on Maximum Entropy?" In The Maxi-
mum Entropy Formalism, edited by R. D. Levine and M. Tribus. Cambridge:
MIT Press, 1978; reprinted in Ref. 10.
10. Jaynes, E. T. Papers on Probability, Statistics, and Statistical Physics, edited
by R. D. Rosenkrantz. Holland: D. Reidel Publishing Co., 1983; reprints of
13 papers dated 1957-1980. Second paperback edition by Kluwer Academic
Publishers, Dordrecht, 1989.
11. Jaynes, E. T. "Bayesian Methods: General Background." In Maximum En-
tropy and Bayesian Methods in Applied Statistics, edited by J. H. Justice.
Cambridge: Cambridge University Press, 1986, 1-25.
12. Jaynes, E. T. "Clearing up Mysteries: The Original Goal." In Maximum En-
tropy and Bayesian Methods, edited by J. Skilling Kluwer. Holland: Academic
Publishers, 1989, 1-27.
13. Jeffreys, H. Probability Theory. Oxford: Oxford Univ. Press, 1939; later edi-
tions 1948, 1961, and 1966. A wealth of beautiful applications showing in de-
tail how to use probability theory as logic.
14. Kac, M. "Some Stochastic Problems in Physics and Mathematics." Collo-
quium Lectures in Pure and Applied Science #2, FRL. Dallas, Texas: Magno-
lia Petroleum Company, 1956.
15. Milonni, P., J. Ackerhalt, and R. A. Smith. "Interpretation of Radiative Cor-
rections in Spontaneous Emission." Phys. Rev. Lett. 31 (1973):958.
16. Power, E. A. "Zero-Point Energy and the Lamb Shift." Am. J. Phys. 34
(1966):516. Note that factors of 2 are missing from Eqs. (13) and (15).
17. Schwinger, J. "Quantum Electrodynamics I. A Covariant Formulation." Phys.

Rev. 74 (1948):1439
18. Schwinger, J. "Quantum Electrodynamics II." Phys. Rev. 75(1948):651.
19. Schwinger, J., L. L. de Raad, and K. A. Milton. "Casimir Effect in Dielec-
trics." Ann. Phys. 115 (1978):1.
20. Senitzky, I. R. "Radiation Reaction and Vacuum-Field Effects in Heisenberg-
Picture Quantum Electrodynamics." Phys. Rev. Lett. 31 (1973):954.
21. Weinberg, S. "The Cosmological Constant Problem." Rev. Mod. Phys. 61
(1989):1-24.
22. Weisskopf, V. F. "Recent Developments in the Theory of the Electron." Revs.
Mod. Phys.21 (1949):305.
23. Welton, T. A. "Some Observable Effects of the Quantum-Mechanical Fluctu-
ations of the Electromagnetic Field." Phys. Rev. 74 (1948):1157.
24. Wigner, E. P. "Reminiscences on Quantum Theory." Colloquium talk at
Washington University, St. Louis, March 27, 1974.
25. Zenner, A. An Introduction to Bayesian Inference in Econometrics. New
York: J. Wiley & Sons, Inc.,1971; reprinted by R. Krieger Pub. Co., Mal-
abar Fla., 1987. The principles of Bayesian inference apply equally well in
all fields, and all scientists can profit from this work.
H. D. Zeh
Institut Mr Theoretische Physik, Universitat Heidelberg, Philosophenweg 19, D6900
Heidelberg 1, West Germany
Quantum Measurements and Entropy
Measurement-like quantum processes may lower the entropy or "create"

unoccupied entropy capacity required for a thermodynamical arrow of time.
The situation is also discussed in the Everett interpretation (where there
is no collapse of the wave function) and for quantum gravity of a closed
universe (where the wave function does not depend on a time parameter).
INTRODUCTION
Measurements in general are performed in order to increase information about
physical systems. This information, if appropriate, may in principal be used for
a reduction of their thermodynamical entropies—as we know from the thought
construction of Maxwell's demon.
As we have been taught by authors like Smoluchowski, Szilard, Brillouin and
Gabor, one thereby has to invest at least the equivalent measure of information
(therefore also called "negentropy") about a physical system in order to reduce
its entropy by a certain amount. This is either required by the Second Law (if
it is applicable for this purpose), or it can be derived within classical statistical
mechanics by using

406 H. D. Zeh
a. determinism and
b. the assumption that perturbations from outside may be treated stochastically
in the forward direction of time (condition of "no conspiracy").
The total ensemble entropy may then never decrease, and one can use diagrams
as that in Figure 1 to represent sets of states for the system being measured (a, b),
the measurement and registration device (0, A, B), and the environment (A', B')
which is required for the subsequent reset of the apparatus.'
In statistical arguments of this kind, no concepts from phenomenological ther-
modynamics have to be used. Statistical counting is a more fundamental notion
than energy conservation or temperature if the concept of deterministically evolv-
ing microscopic states is assumed to apply. The price to be paid for this advantage
is the problem arising from the fact (much discussed at this conference) that the
statistical ensemble entropy is not uniquely related to thermodynamical entropy.
This problem is even more important in the quantum mechanical description.
In quantum theory the statistical entropy is successfully calculated from the density
matrix (regardless of the latter's interpretation). This density matrix changes non-
unitarily (i.e., the state vectors diagonalizing it change indeterministically) in a
measurement process (a situation usually referred to as the collapse or reduction
of the wave function). So, for example, Pauli concluded that "the appearance of a
certain result in a measurement is then a creation outside the laws of nature." This
may be a matter of definition—but the state vector (as it is used to describe an
actual physical situation) is effected by the collapse, and so is the entropy calculated
from it or from the density matrix!
Must this deviation from the deterministic SchrOdinger equation now lead to a
violation of the Second Law, as discussed in a beautiful way at this conference by
Peres? In particular, can Maxwell's demon possibly return through the quantum
back door?
Or Or
msmt reset
S = kIn2 + c S =c S = kIn2+ c
I =0 I = kIn2 I =0
FIGURE 1 Ensemble entropy and information in the deterministic description of a

measurement and the subsequent reset.
Quantum Measurements 407
In general, this is merely a question of principle. The amount of entropy corre-

sponding to the information gain is extremely small compared to the thermodynam-
ical entropy produced during the measurement (of the order of Landauer/Bennett's
kin 2 discussed in computing). However, it will be argued that its effect may have
been important during the early stages of the universe. In fact, it may even have
been essential for the origin of Nature's arrow of time, which is based on an initially
low value of entropy. 19
Questions of principle can only be answered within models. The model used
for this discussion will be a universal quantum theory—either in its conventional
form with a collapsing wave function (thus, according to M. Gell-Mann, making
"concessions" to the traditional point of view), or by using some variant of the Ev-
erett interpretation. Therefore, no classical concepts will be used on a fundamental
level. They will instead be considered as derivable. For the same reason, none of
the arguments used will in any way be based on the uncertainty relations (in any
other sense than the Fourier theorem).
FIRST REMINDER: THE ARROW OF TIME IN RADIATION

Solutions of a hyperbolic-type differential equation can be represented in different
ways, depending on the boundary conditions. For example, the electromagnetic
fields can be written as
FP' = Piv re t+ F'"'1 = Pw adv Fµ" out
although in general one has
FP' Pe, Flu'ret of "nearby" sources ,
RGURE 2 The time arrow of absorbers.

408 H. D. Zeh
T - 2.7K
t = 101%
t = 3.105a T = 4.103K
non-ideal absorber
t=0 T =oo
FIGURE 3 The radiation era as an early cosmic absorber.
where "nearby" may in astronomical situations include stars and galaxies. Eighty
years ago Ritzy required that this condition should hold by law of nature, that
is, exactly if considered for the whole universe. This assumption would eliminate
the electromagnetic degrees of freedom and replace them by a retarded action at a
distance. It corresponds to a cosmological initial condition ("Sommerfeld radiation
condition") f's" in = 0.
A similar proposal was made 70 years later by Penrose" for gravity instead of
electrodynamics in terms of his Weyl tensor hypothesis. Both authors expressed the
expectation that their assumptions might then also explain the thermodynamical
arrow of time.
The usual explanation of the electromagnetic arrow is that it is instead caused
by the thermodynamical arrow of absorbers (see Figure 2): no field may leave an
ideally absorbing region in the forward direction of time. (The same condition is re-
quired in the Wheeler-Feynman absorber theory in addition to their time-symmetric
"absorber condition.")
The electrodynamical arrow of time can then easily be understood inside of
closed laboratories possessing absorbing walls. In cosmology (where the specific
boundary condition is referred to as Olbers' paradox) the situation is slightly differ-
ent. According to the big bang model there was a non-ideal (hot) absorber early in
the universe (the radiation era; see Figure 3). Its thermal radiation has now cooled
down to form the observed 2.7 K background radiation which is compatible with
the boundary condition at wave lengths normally used in experiments. This early
absorber hides the true Pi° in from view—although it is "transparent" for gravity.
However, it is important for many thermodynamical consequences that zero-
mass fields possess a very large entropy capacity ("blank paper" in the language of
information physics). This is true in particular for gravity because of the general
attractivity and self-interaction that leads to the formation of black holes.15
SECOND REMINDER: THE THERMODYNAMICAL ARROW IN

CLASSICAL STATISTICAL DESCRIPTION
In statistical mechanics the "irreversibility" of thermodynamical processes can be
described in most general terms by means of Zwanzig's formalism of master equa-
tions. It consists of four essential steps:
1. Assume a unitary Liouville equation for ensembles p(p,q,t),
iap
— = Lp:= i{H,p}.
at
It corresponds to the Hamiltonian determinism (conservation of probability)
for the individual orbits.
Microscopic determinism can, however, only be a realistic assumption for the
whole universe. This follows from a discussion of Borel,2 who estimated the
effect resulting from the displacement of a small mass on Sirius by a small
distance on the microscopic state of a macroscopic gas here on earth. Within
fractions of a second this state would thereupon become completely different
from what it would have been otherwise. For this reason the "no-conspiracy
condition" is essential.
2. Introduce an appropriate "concept of relevance" (or generalized coarse graining)

by means of an idempotent operator P
p„: = Pp with P2 = P and j preidp dq = pdp dq .
This leads to a coupled dynamics for p fel and Pirrel = (1 — P)p according to
ia Prel
=P L Pr el + PLpirrei
at
iaPirret
=(1 — P)LPre/ + (1 — P)Lpirrez
at
Then formally solve the second equation for Pirrel(t) with pr el as an inhomo-
geneity (just as when calculating the electromagnetic field as a functional of the
sources), and insert it into the first one to get the still exact (hence reversible)
pre-master equation for pre:
iaPret = PLprel (t) + PLei(1-

at P)Ll Pirrel(0) o
G(r)prei(t - r)
with G(r)=PLe-i(1-P)LT (1 -P)LP .

The meaning of its three terms is indicated in Figure 4. They correspond to a
direct interaction for pre, an action of the initial Pirrel (0) via the "irrelevant
410 H. D. Zeh
channel," and a retarded ( if t > 0) non-markovian action (corresponding to

Flu'ret ), respectively. (This equation is valid for t < 0 as well.) .
3. Eliminate the irrelevant degrees of freedom by assuming a cosmological ini-

tial condition pi„,i(0) = 0 (analogous to F1'4'1. = 0 or Weylin = 0) and a
large information capacity for pirrei in order to approximately obtain the re-
tarded master equation (which is markovian after also assuming a slowly varying
Pret(t))
aPrei(i) _ =
— —GraPrel(t) with Gra = G(T)dr .
at
It represents an "alternating dynamic? (see Figure 5) consisting of the exact
dynamics and the stepwise neglect of the irrelevant "information" which might
otherwise flow back into the relevant channel. This is usually justified because
of the initial condition and the large information capacity of the irrelevant
degrees of freedom. The "doorway states" correspond to the radiation from the
"nearby sources" in electrodynamics.
This master equation would become trivial in the limit At 0, which corre-
sponds to what in quantum mechanics is called the watchdog effect or Zeno's
quantum paradox.7'12
4. Define a specific concept of entropy (depending on P) by
S = —k f prel(In prel)dp dq
to obtain dSldt > 0, that is, a monotonic loss of relevant information. In gen-
eral, however, the equality sign dS/dt Ar. 0 would be overwhelmingly probable
unless the further initial condition S(t = 0) < Smas held. This fact again
possesses its analogue in electrodynamics: not all sources must be absorbers in
order to prevent a trivial situation.
'Prei(c) -4\Prei(t)
I (_
relevant chann(
( •i
\ s'i
----Neel
Pirrel(°) lll :
irrelevant channel
II
t=0 t=ti t
FIGURE 4 Information flow by means of Zwanzig's pre-master equation.

,--.. - , .% •. relevant channel

► II, .II II, 1 I I 1,
v\/ it .,, .,, .4....."1
00--% doorway channel
1...... ____ _ _,,,,_ _ _, 2, _ \___ irrelevar t
\ \ \ \
channel
deep-lying states
At
FIGURE 5 Alternating dynamics represented by the master equation.
There exist many concepts of relevance (or "faces of entropy") suited for differ-
ent purposes. Best known are Gibbs' coarse graining and Boltzmann's restriction
to single-particle phase space (with the initial condition of absent particle correla-
tions referred to as molecular chaos). They depend even further on the concept of
particles used (elementary or compound, often changing during phase transitions
or chemical reactions). Two others, Pleca1 and Pma,.o, will be considered in more
detail.
For most of the relevance concepts used, the condition pirrez(0) = 0 does not
appear very physical, since it refers to the knowledge described by the ensembles.
Only some of them are effective on pure ("real") states, that is, they define a non-
trivial entropy or "representative ensemble" as a function of state. All of them
are based on a certain observer-relatedness: there is no objective reason for using
ensembles or a concept of relevance. Also, Kolmogorov's entropy is based on a
relevant measure of distance, while algorithms (used to define algorithmic entropy)
are based on a choice of relevant coordinates. Hence, what we call chaos may merely
be chaos to us!
Two Zwanzig projections will be of particular interest to illuminate the special
character of quantum aspects. The first one is
PlecalP = Sgef s (r)d3 r ,
with 3-dimensional volume elements AV containing many particles each. It is in-

effective on pure classical states:
PlocaleN =
The last statement is not true any more in quantum mechanics because of the
existence of the fundamental quantum correlations which lead to the violation of
the Bell inequality.
The second projection of special interest is defined by
412 H. D. Zeh
FIGURE 6 Deterministic transformation of "physical entropy" into lacking information."
Pa
PmacroP(P)q) = const =:Pa = Vac on a(p, q) = const ,
for "robust" (slowly changing and insensitive to perturbations) or "macroscopic"

functions of state a(p, q). The dynamics within a = const may be assumed to be
quasi-ergodic. The microscopic dynamics p(t), q(t) then induces a macroscopic dy-
namics a(t) := a(p(t), q(t)). This will again not remain true in quantum mechanics.
Under this second projection the entropy consists of two terms,
S[PmacroPJ —kEpc, In p + Epakln Va ,
which represent the "lacking information" about the macroscopic quantities and
the mean "physical entropy" S(a) = k Intro, (Planck's "number of complexions"),
respectively. This allows the deterministic transformation of physical entropy into
"lacking information" (thereby conserving the ensemble entropy as in Figure 1).
It is, in fact, part of Szilard's gedanken engine (Figure 6), where the transforma-
tion of entropy into lacking information renders the subdensities "robust." In its
quantum version, this first part of the procedure may require the production of
an additional, negligible but non-zero, amount of entropy in order to destroy the
quantum correlations between the two partial volumes."
THE TIME ARROW OF QUANTUM MEASUREMENTS

The unitary quantum Lionville (von Neumann) equation
io p
= L p = [H, p] ,
Ot
corresponds again to the determinism (not to the unitarity) of the Schroclinger

equation.
In quantum theory one often uses a specific relevance concept for the formula-
tion of master equations. It is defined by the neglect of nondiagonal matrix elements,
Pr el = P diag Arm = PramOrnn
with respect to a given (relevant) basis. Zwanzig's equation then becomes the
van Hove equation (with an additional Born approximation the Pauli equation,
or Fermi's Golden Rule after summing over final states). It has the form
dpmn
EnAmn(Prsn — Pmm)
dt =
with transition probabilities Amn analogous to Boltzmann's Stof3zahlansatz. The
meaning and validity of Zwanzig's approximation depends crucially on the choice
of the "relevant basis." For example, it would become trivial (Amn = 0) in the
exact energy basis.
In spite of its formal analogy to the classical theory, the quantum master equa-
tion describes the fundamental quantum indeterminism—not only an apparent in-
determinism due to the lack of initial knowledge. For example, Pauli's equation is
identical to Born's original probability interpretation3 (which also introduced the
Born approximation). It was to describe probabilities for new wave functions (not
for classical particle positions), namely for the final states of the quantum jumps
between Schrodinger's stationary eigenstates of the Hamiltonians of noninteracting
local systems (which thus served as the dynamically "relevant basis"). Even these
days the eigenstates of the second (and recently also of the "third") quantization are
sometimes considered as a "natural" and therefore fundamental basis of relevance
to describe the collapse of the wave function as an objective process—although laser
physicists, of course, know better.
Hence, the analogy to the classical theory is misleading. The reason is that
the ensemble produced by the Zwanzig projection P from a pure state in general
does not contain this state itself any more. According to the very foundation of the
concept of the density matrix, it merely describes the probabilities for a collapse
into the original state (or from it into another state).
In order to see this, the measurement process has to be considered in more
detail. Following von Neumann's formulation one may write
(Ecn On)*0 Ecn ‘I'n —4 One *no ,
where the first step represents an appropriate interaction in accordance with the
SchrOdinger equation and the second step the collapse. I have left out an interme-
diate step leading to the ensemble of potential final states with their corresponding
probabilities, since it describes only our ignorance of the outcome. The determin-
istic first step can again be realistic only if tk represents the whole "rest of the
universe," including the apparatus and the observer. This is the quantum analogue
414 H. D. Zeh
of Borel's discussion of the extreme sensitivity of classical systems to their environ-

ments. Without the assumption of a wave function of the universe, no consistent
theory would, therefore, be available at all—and no questions of principle could be
answered.
The change of the ensemble entropy in the process of above is trivially given
by
Sensemble = 0 Sensemble = 0 Sensemble = 0,
whereas the corresponding local entropies are

Sjecal = 0 Sieeal 0 0 Siocal = 0 ,
since the intermediate state is nonlocal.

The fundamental quantum indeterminism is here represented by the fact that in
the quantum formalism the initial ensemble entropy may vanish: there is no ensem-
ble of different initial states (no "causes" or "sufficient reasons" for the different
outcomes). This is in contrast to the classical situation represented by Figure 1.
(The change in the ensemble entropy of any conjectured hidden variables would
thus have to be compensated for in some unknown way during the measurement
process.)
In the above description of a measurement, the "pointer position" gin must
be a robust state in the sense mentioned before in order to form a genuine
measurement. In this case the entropy changes according to Slocal Sphysicci =
S(a) during the collapse. The collapse is then part of the "objective" macroscopic
dynamics a(t): in contrast to the classical description no a(t) is induced by the
SchrOdinger equation.
On the other hand, Ilf„ is not robust for "measurement-like processes" without
reading of the result, such as they occur in the continuous measurement by the
environment which leads to decoherence (or rather to the delocalization of phase
relations). The measurement-like processes thus cause the local entropy to increase
by producing nonlocal correlations. (Cf. also Partovi's contribution to this con-
ference.) For example, a small dust grain in intergalactic space produces entropy
of the order of Sic,„/ k ln(107) within 10-8 sec.8 This is very small on a ther-
modynamical scale, although large compared to k in 2. The "irreversibility" of this
process is caused by the assumption of a Sommerfeld radiation condition for the
scattering—similar to Boltzmann's assumption of molecular chaos.
This result leads to the surprising consequence that classical mechanics (which
in the absence of friction is usually considered as describing an exactly reversible
dynamics) requires in quantum description the continuous action of irreversible pro-
cesses for the classical properties to remain classical. The same is true for macro-
scopic computers: only strictly microscopic computers could in principle work re-
versibly. Even the most effective information system (the genetic code) is macro-
scopic in this sense. Mother Nature may have her reasons, since this property seems
to stabilize the information by means of the watchdog effect.' The effect also means
that a (classically conceivable) external Laplacean demon (assumed to observe the
world but not to react upon it) would have to decohere a quantum world.
The corresponding master equation of local relevance requires some initial con-
dition like
(1 — Plocal)Pin Pe, 0
( no initial quantum correlations) .

In contrast to the classical theory this condition is nontrivial even for a pure quan-
tum state. It may, therefore, refer to "reality"—not merely to "our" knowledge
about the initial state.
In principle, however, the collapse may reduce the local entropy according to
the formalism of quantum theory! Although this is usually a small effect on a
thermodynamical scale, it seems to occur in all phase transitions, situations of
self-organization, etc.—whenever "virtual fluctuations" become macroscopic by the
action of decoherence.
Lubkinll has recently shown that this reduction of entropy cannot be used for
the construction of a perpetuum mobile of the second kind—again, because of the
required reset as studied by Bennett.' However, there is no reset of the universe.
Therefore, consider the unique event of a phase transition of the vacuum in the
early universe. It is most conveniently described as a transition between extrema
of the Mexican hat potential in the form
14) Fi -+ Er-- 41.0e=°) =: 10)
This process contains the collapse, since the SchrOdinger equation with a symmetric
Hamiltonian can only lead from the false vacuum VI, an 0) to a symmetric superpo-
sition f 10)4. Unless the false vacuum has the same energy expectation value as
the physical vacuum, the state on the right-hand side must also contain excitations
(which, in fact, contribute to the "measurement" of the value of rk characterizing a
specific vacuum).
Except for the Casimir/Unruh-correlations the vacuum is a local state; that is,
it can approximately be written as the same vacuum at every place, {4) rir 10)r -
This is not true for the symmetric superposition f {0)4. Under the action of Procar,
this non-local state would instead lead to a mixed density matrix Pr,
Pr cc E wok. •
—r
Only the collapse leads then to a local zero-entropy state again, since it transforms
a non-local state into a local state.
It appears suggestive that a similar mechanism created the degrees of freedom
represented by the realistic zero-mass particles (photons and gravitons). This would
correspond to the creation of a large unoccupied entropy capacity without deter-
ministic "causes" (which would otherwise have to be counted by the previous values
of the ensemble entropy as in Figure 1), or of "blank paper from nothing" by the
symmetry-breaking power of the collapse.
416 H. D. Zeh
These considerations were so far based on the collapse as a process in time.

However, on the one hand, there is the (supposedly equivalent or even superior)
Everett interpretation which does not contain the collapse, and on the other, there
exists quantum gravity (necessarily to be taken into account in the whole wave
function of the universe) which is stationary and does not contain any time param-
eter! The implications of these aspects for what happens to the entropy in quantum
measurements will be discussed next.
THE EVERETT INTERPRETATION (NO COLLAPSE)

This interpretation is based on the sole validity of the deterministic Schrodinger
equation with an essentially time-symmetric Hamiltonian. How, then, can it be
equivalent to the collapse interpretation? How may the reduction of entropy induced
by the collapse be understood in it?
The answer appears straightforward. The physical entropy S(a) is calculated
as a function of the branches characterized by the macroscopic variables a (or "rel-
ative" to the state of the observer)—not as a functional of the total wave function.
It is, hence, different in each branch 4:1),,. However, the branching can be equivalent
to the collapse only if the arising branches are afterwards dynamically independent
from each other (or robust). But how can the time-direction of this dynamically in-
terpreted branching be compatible with the time-symmetric Schriidinger equation?
Why are there only processes of the type (E cn4kn)Wo --0 E cn.nlicn, with a robust
state lIfn = Wa, and no inverse branchings of the type E --0 (E cncl'n )in?
Obviously, this requires an initial condition for the total Everett wave function,
namely the condition of absence of matching "other" components (n 0 no) in
the past. Given local interactions only, this condition could well be of the type
tfin = *local =11. 111A,v, again, that is, the same as required for thermodynamics.
The quantum mechanical arrow of time, therefore, appears in the Everett inter-
pretation as an evolution from a local initial state into a more and more correlated
state, that is, towards an increasing branching. It is thus formulated fact-like (as
a specific property of the universal state), whereas the collapse would have to be
considered as law-like.
In the Everett interpretation the deterministically evolving state vector may be
considered as representing quantum "reality." In this world model it is the observer
whose identity changes indeterministically (subject to the branching), and so does
the "relative state" of the universal wave function correlated to him. This is anal-
ogous to the process of cell division in normal space, which even in a Laplacean
world would not determine one of the daughter cells as a unique successor. The
"existence" of the other branches is concluded by hypothetically extrapolating the
empirical laws (in this case the Schradinger equation)—precisely as we conclude
the existence of objects while not being observed, physical processes in the interior
of the sun, and even events behind a spacetime horizon. Denying the Everett in-
terpretation (or considering its other branches as mere "possibilities") is hence just
another kind of solipsism!
This consideration emphasizes the observer-relatedness of the branching (and,
therefore, of entropy). A candidate for its precise formulation may be the Schmidt
canonical single sum representation
*(t) E r);70,„(04.„(i),
1
with respect to any (local) observer system 4. It is unique (except for degeneracy),
and therefore defines a "subjective" basis of relevance, although macroscopic prop-
erties contained in n seem to be objectivized by means of quantum correlations and
the "irreversible" action of decoherence.16
QUANTUM GRAVITY (NO TIME)

The dynamics of a quantum field theory that contains quantum gravity is de-
scribed by the stationary Wheeler-DeWitt equation (Einstein-Schrodinger equa-
tion) or Hamiltonian constraint
1141[131G, 4)(01 = 0 .
This equation does not allow to impose an initial condition of low entropy in the
usual way. How, then, can correlations such as those which are required to define
the branching evolve?
The answer seems to be contained in the fact that the Wheeler-DeWitt Hamil-
tonian H is hyperbolic. For example, for Friedmann-type models with a massive
quantum field (with its homogeneous part called 4>) one has
a a a
H = + ac,2 — 84,2 — - + v(a, - -) =: actz + nred
Er 2
where the dots refer to the higher multipoles of geometry and matter on the Fried-
mann sphere.6 This allows one to impose an initial condition with respect to the
"intrinsic time" a = In a, the logarithm of the Friedmann expansion parameter. The
reduced dynamics Hr ed defines an intrinsic determinism, although not, in general,
an intrinsic unitarily, since V(a, • - -) may be negative somewhere.
Because of the absence, in the wave function, of a term exp(icet), there is
no meaningful distinction between exp(+ika) and exp(—ika). (Halliwell—see his
contribution to this conference—has presented arguments that these components
decohere from another.) So the intrinsic big bang is identical to the intrinsic big
crunch: they form one common, intrinsically initial, "big brunch."
418 H. D. Zeh
On the other hand, because of the physical meaning of a, the potential V

cannot be expected to be intrinsically "time"-symmetric under reversal of a. This
asymmetry defines an intrinsic dynamical (law-like) arrow of time which is equal to
that of the expansion of the universe.
This intrinsic dynamics gives rise to a paradox: Whereas classical determinism
may force its orbits to return in a (that is, the universe to recollapse), the intrinsic
wave determinism allows one to exclude the "returning" parts of the corresponding
wave packets by free choice of the "initial" conditions on a "time"-like hypersurface
such as a = const. The paradox can be resolved by means of a "final" condition
of square integrability of the wave function (hence %If 4 0) for a oo. This con-
dition is facilitated by the non-unitarity of the intrinsic dynamics for Hred 2 < 0.
It forms half of a complete boundary condition which is different from Hartle and
Hawking's, but represents conventional quantum mechanics. Although there are no
classically forbidden regions (since the kinetic energy is not positive definite), this
"final" condition leads to a reflection of wave packets in "time" from the potential
barrier which arises from a positive spatial curvature of the universe.18 The "initial"
condition for the total wave function is then not completely free: the "returning"
part of a wave packet must be present "initially," too.
A simple soluble example describing a situation of this kind is given by the
normalizable eigenstates of the indefinite anisotropic harmonic oscillator with a
rational ratio wa : ,
82 492
H = — + Tc2- + a 2a2 4442,2 ground state energy .
From its solutions one may construct coherent wave tubes which approximately
define "orbits of causality" (see Figure 7) even when the actual wave function ex-
tends over the whole superspace. Similar behavior is found for other appropriate
potentials, although the wave packets in general show dispersion towards the turn-
ing point in a.1°
Corresponding wave functions in high-dimensional superspace show of course
more complex behavior and may lead to an increasing branching with increasing
a if an "initial" condition of lacking correlations holds for a —co. If one, then,
formally follows a turning classical orbit in mini- or midi-superspace, one should
observe branching of the wave function for the microscopic variables on the ex-
pansion leg, but recombination (inverse branching) on the return leg. This point
of view is however merely a relict of the concept of classical orbits; the subjective
arrow of time should in each leg be determined by the thermodynamical one. Closer
to the turning point no clearly defined arrow can exist along the classical orbits,
although the situation there seems to be very different from thermal equilibrium.
The consequence of this (in the classical picture "double-ended") quantum ver-
sion of the Cosmic Censorship postulate for the formation of black holes is not yet
understood.17
oo— v zo; p4-- (Auo v)111 4=
uop!puo !emu! ue 'papapau aq treo saqojodol repsds

waiamp jo suomsodiadns p 'apturexa Io3 •(• • • 'yo)A repualod alp jo AilautuAse
-„alup„ alp luoij suoyviaLloo 6tqlovi fo uozppuoa iv!pu! po!fiolottisoo aqi umdza
431 ameq uag3 woes aslamun atp jo uopounj mem alp jo uoponiruoo spa,
-Tesodoid sAuplmeji pue aprell Rq p0.1!dSe osre jeo2
e—sopu"euAp ski 1.1104 asiamun atp JO uopaunj mem 9q aupulalap ppiom 4! 1,411!maa
painparuo3 Aveme se 'uopnios anb!un pal at ;j -Jammu' reuopumuoo Spipua
ue u! uompuoo Srepxmoq alp alaidtuoo pinom spud Loo— 4— ;a arreogdde aq ogre
Sou woo 'oo X7 Joj Ham 31.10m 01 sums uompuoo Allipiei2alu! alp aouls
et•64aslamun alp jo
sappirenb Jeo!ssep Isour„, alp um; of uraas sampen asalp 'pej uj -amp jo lclaouoo
lea!ssep a jo oauaLousa alp of Sqaaalp pue '(3ualuuo.iptua us se dye qopim) saiodp
jam .1010pi atp £q pue u saprepen alp jo luatuainseam snontrpuoo ampajja
atp 'azoluialpinj 'svear suopepiioo lunprenb Su!seanu! jo moire aurp aqi
'cap pub uesotp 944 Aq peniosai lou s1 seqn4 etjo einpruis mem
.o!isIngele!„ o!supw! eqi lepow s144 p 'owe ue sl gpo eta jo s6e1 ong 944 ueenuaq
AnewwAs eqi .•Amesneo eu!PP (1 6 = pue p < 101 peuold
wet) Jotemoso owownq el!ugepul o!clauoswe eta jo (41, `v),it seqnt enem L aunDu
6 Lt? sweweinseen wnweno

420 H. D. Zeh
could simply be enforced by appropriate potential barriers for the multipole am-
plitudes. It would describe an initially "simple" (since unstructured) universe, al-
though not the ground state of the higher multipoles on the Friedmann sphere. Any
concept of ground or excited states could only be meaningful for them after they
have entered their domain of adiabaticity.
This conceivable existence of a completely symmetric pure initial state (instead
of a symmetric ensemble of very many states, the "real" member of which we were
then unable to determine from this initial condition) is a specific consequence of the
superposition principle, that is, of quantum mechanics. Before the "occurrence" of
the first collapse or branching, the universe would then not contain any non-trivial
degrees of freedom, or any potentiality of complexity.
This determination of the total wave function of the universe from its dynamics
depends of course on the behavior of the realistic potential V(a, (1), • • •) for a —oo.
Since it refers to the Planck era, this procedure would require knowledge about a
completely unified quantum field theory. Hopefully, this property of the potential
may turn out to be a useful criterion to find one! An appropriate potential for the
higher modes would even be able to describe their effective cut-off at wavelengths
of the order of the Planck length (useful for a finite renormalization) at all times.
ACKNOWLEDGMENT
I wish to thank C. Kiefer and H. D. Conradi for their critical reading of the
manuscript. Financial help from the Santa Fe Institute is acknowledged. This con-
tribution was not supported by Deutsche Forschungsgemeinschaft.
REFERENCES
1. Bennett, C. H. "Logical Reversibility of Computation." IBM J. Res. Dev. 17
(1973):525.
2. Borel, E. Le hasard. Paris: Alcan, 1924.
3. Born, M.. "Das Adiabatenprinzip in der Quantenmechanik." Z. Physik 40
(1926):167.
4. DeWitt, B. S. "Quantum Theory of Gravity. I. The Canonical Theory." Phys.
Rev. 160 (1967):1113.
5. Einstein, A., and Ritz, W. "Zum gegenwartigen Stand des Strahlungsprob-
lems." Phys. Z. 10 (1909):323.
6. Halliwell, J. J., and S. W. Hawking. "Origin of Structure in the Universe."
Phys. Rev. D31 (1985):1777.
7. Joos, E. "Continuous Measurement: Watchdog Effect versus Golden Rule."
Phys. Rev. D29 (1984):1626.
8. Joos, E., and H. D. Zeh, "The Emergence of Classical Properties through In-
teraction with the Environment." Z. Phys. B59 (1985):223.
9. Kiefer, C. "Continuous Measurement of Mini-Superspace Variables by Higher
Multipoles." Class. Qu. Gravity 4 (1987):1369
10. Kiefer, C. "Wave Packets in Mini-Superspace." Phys. Rev. D38 (1988):1761.
11. Lubkin, E. "Keeping the Entropy of Measurement: Szilard Revisited." Intern.
J. Theor. Phys. 26 (1987):523.
12. Misra, B., and B. C. G. Sudarshan. "The Zeno's Paradox in Quantum The-
ory." J. Math. Phys. 18 (1977):756.
13. Padmanabhan, T. "Decoherence in the Density Matrix Describing Quantum
Three-Geometries and the Emergence of Classical Spacetime." Phys. Rev.
D39 (1989):2924. See also J. J. Halliwell's contribution to this conference.
14. Penrose, R. "Singularities and Time-Asymmetry." In General Relativity,
edited by S. W. Hawking and W. Israel. Cambridge: Cambridge University
Press, 1979.
15. Penrose, R. "Time Asymmetry and Quantum Gravity." In Quantum Gravity
2, edited by C. J. Isham, R. Penrose and D. W. Sciama. Oxford: Clarendon
Press, 1981.
16. Zeh, H. D. "On the Irreversibility of Time and Observation in Quantum The-
ory." In Enrico Fermi School of Physics IL, edited by B. d'Espagnat. New
York: Academic Press, 1971.
17. Zeh, H. D. "Einstein Nonlocality, Space-Time Structure, and Thermodynam-
ics." In Old and New Questions in Physics, Cosmology, Philosophy, and The-
oretical Biology, edited by A. van der Merwe. New York: Plenum, 1983.
18. Zeh, H. D. "Time in Quantum Gravity." Phys. Lett. A126 (1988):311.
19. Zeh, H. D. The Physical Basis of the Direction of Time. Heidelberg:
Springer, 1989.
422 H. D. Zeh
20. Zurek, W. H. "Maxwell's Demon, Szilard's Engine and Quantum Measure-

ments." In Frontiers of Nonequilibrium Statistical Physics, edited by G. T.
Moore and M. T. Scully. New York: Plenum, 1986.
VI Quantum Theory and
Measurement
Murray Gell-Mannt and James B. Hartlet
f California Institute of Technology, Pasadena, CA 91125 USA and Department of Physics,
University of California, Santa Barbara, CA 93106 USA
Quantum Mechanics in the Light of Quantum

Cosmology
We sketch a quantum-mechanical framework for the universe as a whole.

Within that framework we propose a program for describing the ultimate
origin in quantum cosmology of the "quasiclassical domain" of familiar ex-
perience and for characterizing the process of measurement. Predictions
in quantum mechanics are made from probabilities for sets of alternative
histories. Probabilities (approximately obeying the rules of probability the-
ory) can be assigned only to sets of histories that approximately decohere.
Decoherence is defined and the mechanism of decoherence is reviewed. De-
coherence requires a sufficiently coarse-grained description of alternative
histories of the universe. A quasiclassical domain consists of a branching set
of alternative decohering histories, described by a coarse graining that is, in
an appropriate sense, maximally refined consistent with decoherence, with
individual branches that exhibit a high level of classical correlation in time.
We pose the problem of making these notions precise and quantitative. A
quasiclassical domain is emergent in the universe as a consequence of the
initial condition and the action function of the elementary particles. It is
an important question whether all the quasiclassical domains are roughly
equivalent or whether there are various essentially inequivalent ones. A
measurement is a correlation with variables in a quasiclassical domain. An

426 Murray Gell-Mann and James B. Hartle
"observer" (or information gathering and utilizing system) is a complex

adaptive system that has evolved to exploit the relative predictability of a
quasiclassical domain, or rather a set of such domains among which it can-
not discriminate because of its own very coarse graining. We suggest that
resolution of many of the problems of interpretation presented by quantum
mechanics is to be accomplished, not by further scrutiny of the subject as it
applies to reproducible laboratory situations, but rather by an examination
of alternative histories of the universe, stemming from its initial condition,
and a study of the problem of quasiclassical domains.
I. QUANTUM COSMOLOGY
If quantum mechanics is the underlying framework of the laws of physics, then
there must be a description of the universe as a whole and everything in it in
quantum-mechanical terms. In such a description, three forms of information are
needed to make predictions about the universe. These are the action function of the
elementary particles, the initial quantum state of the universe, and, since quantum
mechanics is an inherently probabilistic theory, the information available about
our specific history. These are sufficient for every prediction in science, and there
are no predictions that do not, at a fundamental level, involve all three forms of
information.
A unified theory of the dynamics of the basic fields has long been a goal of
elementary particle physics and may now be within reach. The equally fundamental,
equally necessary search for a theory of the initial state of the universe is the
objective of the discipline of quantum cosmology. These may even be related goals;
a single action function may describe both the Hamiltonian and the initial state.111
There has recently been much promising progress in the search for a theory
of the quantum initial condition of the universe [21 Such diverse observations as
the large-scale homogeneity and isotropy of the universe, its approximate spatial
flatness, the spectrum of density fluctuations from which the galaxies grew, the
thermodynamic arrow of time, and the existence of classical spacetime may find a
unified, compressed explanation in a particular simple law of the initial condition.
The regularities exploited by the environmental sciences such as astronomy,
geology, and biology must ultimately be traceable to the simplicity of the initial
I11As in the "no boundary" and the "tunneling from nothing proposals" where the wave function
of the universe is constructed from the action by a Eudidean functional integral in the first case
or by boundary conditions on the implied Wheeler-DeWitt equation in the second. See, e.g., Refs.
27 and 53.
[21For recent reviews see, e.g., J. J. Halliwell23 and J. B. Hartle3°,33 For a bibliography of papers
on quantum cosmology, see J. J. Halliwell.24
Quantum Mechanics in the Light of Quantum Cosmology 427
condition. Those regularities concern specific individual objects and not just re-
producible situations involving identical particles, atoms, etc. The fact that the
discovery of a bird in the forest or a fossil in a cliff or a coin in a ruin implies the
likelihood of discovering another similar bird or fossil or coin cannot be derivable
from the laws of elementary particle physics alone; it must involve correlations that
stem from the initial condition.
The environmental sciences are not only strongly affected by the initial con-
ditions but are also heavily dependent on the outcomes of quantum-probabilistic
events during the history of the universe. The statistical results of, say, proton-
proton scattering in the laboratory are much less dependent on such outcomes.
However, during the last few years there has been increasing speculation that,
even in a unified fundamental theory, free of dimensionless parameters, some of
the observable characteristics of the elementary particle system may be quantum-
probabilistic, with a probability distribution that can depend on the initial condi-
tion.13]
It is not our purpose in this article to review all these developments in quantum
cosmology. Rather, we will discuss the implications of quantum cosmology for one
of the subjects of this conference—the interpretation of quantum mechanics.
II. PROBABILITY
Even apart from quantum mechanics, there is no certainty in this world; therefore
physics deals in probabilities. In classical physics probabilities result from ignorance;
in quantum mechanics they are fundamental as well. In the last analysis, even when
treating ensembles statistically, we are concerned with the probabilities of particular
events. We then deal the probabilities of deviations from the expected behavior of
the ensemble caused by fluctuations.
When the probabilities of particular events are sufficiently close to 0 or 1, we
make a definite prediction. The criterion for "sufficiently close to 0 or 1" depends on
the use to which the probabilities are put. Consider, for example, a prediction on the
basis of present astronomical observations that the sun will come up tomorrow at
5:59 AM f 1 min. Of course, there is no certainty that the sun will come up at this
time. There might have been a significant error in the astronomical observations or
the subsequent calculations using them; there might be a non-classical fluctuation in
the earth's rotation rate or there might be a collision with a neutron star now racing
across the galaxy at near light speed. The prediction is the same as estimating the
probabilities of these alternatives as low. How low do they have to be before one
sleeps peacefully tonight rather than anxiously awaiting the dawn? The probabilities
(31As, for example, in recent discussions of the value of the cosmological constant see, e.g., S. W.
Hawking,35 S. Coleman,4 and S. Giddings and A. Strominger.20
predicted by the laws of physics and the statistics of errors are generally agreed to
be low enough!
All predictions in science are, most honestly and most generally, the probabilis-
tic predictions of the time histories of particular events in the universe. In cosmology
we are necessarily concerned with probabilities for the single system that is the uni-
verse as a whole. Where the universe presents us effectively with an ensemble of
identical subsystems, as in experimental situations common in physics and chem-
istry, the probabilities for the ensemble as a whole yield definite predictions for the
statistics of identical observations. Thus, statistical probabilities can be derived, in
appropriate situations, from probabilities for the universe as a whole.13,26,21,11
Probabilities for histories need be assigned by physical theory only to the ac-
curacy to which they are used. Thus, it is the same to us for all practical purposes
whether physics claims the probability of the sun not coming up tomorrow is 10"1°57
or 10'1° , as long as it is very small. We can therefore conveniently consider ap-
proximate probabilities, which need obey the rules of the probability calculus only
up to some standard of accuracy sufficient for all practical purposes. In quantum
mechanics, as we shall see, it is likely that only by this means can probabilities be
assigned to interesting histories at all.
III. HISTORICAL REMARKS

In quantum mechanics not every history can be assigned a probability. Nowhere
is this more clearly illustrated than in the two-slit experiment (Figure 1). In the
usual discussion, if we have not measured which slit the electron passed through
on its way to being detected at the screen, then we are not permitted to assign
probabilities to these alternative histories. It would be inconsistent to do so since
the correct probability sum rules would not be satisfied. Because of interference,
the probability to arrive at y is not the sum of the probabilities to arrive at y going
through the upper and the lower slit:
P(y) MO +1,L(Y) (1)
because
Ith(Y) + 1Pu(Y)I2 Ith(Y)12 + 11,111(012 (2)
If we have measured which slit the electron went through, then the interference is
destroyed, the sum rule obeyed, and we can meaningfully assign probabilities to
these alternative histories.
It is a general feature of quantum mechanics that one needs a rule to determine
which histories can be assigned probabilities. The familiar rule of the "Copenhagen"
interpretations described above is external to the framework of wave function and
FIGURE 1 The two-slit experiment. An electron gun at right emits an electron traveling
towards a screen with two slits, its progress in space recapitulating its evolution in time.
When precise detections are made of an ensemble of such electrons at the screen it
is not possible, because of interference, to assign a probability to the alternatives of
whether an individual electron went through the upper slit or the lower slit. However, if
the electron interacts with apparatus which measures which slit it passed through, then
these alternatives decohere and probabilities can be assigned.
SchrOdinger equation. Characteristically these interpretations, in one way or an-

other, assumed as fundamental the existence of the classical domain we see all
about us. Bohr spoke of phenomena that could be described in terms of classi-
cal language (Q Landau and Lifshitz formulated quantum mechanics in terms of a
separate classical physics 41 Heisenberg and others stressed the central role of an
external, essentially classical observer [51 A measurement occurred through contact
with this classical domain. Measurements determined what could be spoken about.
Such interpretations are inadequate for cosmology. In a theory of the whole
thing there can be no fundamental division into observer and observed. Measure-
ments and observers cannot be fundamental notions in a theory that seeks to discuss
141See the essays "The Unity of Knowledge" and "Atoms and Human Knowledge," reprinted in
N. Bohr.2
151For clear statements of this point of view see F. London and E. Bauer," and R. B. Peierls."
the early universe when neither existed. There is no reason in general for a classical
domain to be fundamental or external in a basic formulation of quantum mechanics.
It was Everett who in 1957 first suggested how to generalize the Copenhagen
framework so as to apply quantum mechanics to cosmology.16) His idea was to take
quantum mechanics seriously and apply it to the universe as a whole. He showed
how an observer could be considered part of this system and how its activities—
measuring, recording, and calculating probabilities—could be described in quantum
mechanics.
Yet the Everett analysis was not complete. It did not adequately explain the
origin of the classical domain or the meaning of the "branching" that replaced the
notion of measurement. It was a theory of "many worlds" (what we would rather
call "many histories"), but it did not sufficiently explain how these were defined
or how they arose. Also, Everett's discussion suggests that a probability formula is
somehow not needed in quantum mechanics, even though a "measure" is introduced
that, in the end, amounts to the same thing.
Here we shall briefly sketch a program aiming at a coherent formulation of
quantum mechanics for science as a whole, including cosmology as well as the en-
vironmental sciences[71 It is an attempt at extension, clarification, and completion
of the Everett interpretation. It builds on many aspects of the post-Everett de-
velopments, especially the work of Zeh,56 Zurek,58,59 and Joos and Zeh.37 In the
discussion of history and at other points it is consistent with the insightful work (in-
dependent of ours) of Griffiths22 and Omnes.46,47,48 Our research is not complete,
but we sketch, in this report on its status, how it might become so.
IV. DECOHERENT SETS OF HISTORIES

(A) A CAVEAT
We shall now describe the rules that specify which histories may be assigned prob-
abilities and what these probabilities are. To keep the discussion manageable we
make one important simplifying approximation. We neglect gross quantum varia-
tions in the structure of spacetime. This approximation, excellent for times later
than 10-43 sec after the beginning, permits us to use any of the familiar formula-
tions of quantum mechanics with a preferred time. Since histories are our concern,
we shall often use Feynman's sum-over-histories formulation of quantum mechanics
with histories specified as functions of this time. Since the Hamiltonian formula-
tion of quantum mechanics is in some ways more flexible, we shall use it also, with
16IThe original paper is by Everett10 . The idea was developed by many, among them Wheeler",
DeWitt?, Geroch19 , and Mukhanov" and independently arrived at by others, e.g., Gell-Mannl°
and Cooper and VanVechtexi.5 There is a useful collection of early papers on the subject in Ref. 8.
[7]Sorne elements of which have been reported earlier. See M. Gell-Mann.17
its apparatus of Hilbert space, states, Hamiltonian, and other operators. We shall
indicate the equivalence between the two, always possible in this approximation.
The approximation of a fixed background spacetime breaks down in the early
universe. There, a yet more fundamental sum-over histories framework of quantum
mechanics may be necessary.P31 In such a framework the notions of state, operators,
and Hamiltonian may be approximate features appropriate to the universe after
the Planck era, for particular initial conditions that imply an approximately fixed
background spacetime there. A discussion of quantum spacetime is essential for
any detailed theory of the initial condition, but when, as here, this condition is not
spelled out in detail and we are treating events after the Planck era, the familiar
formulation of quantum mechanics is an adequate approximation.
The interpretation of quantum mechanics that we shall describe in connection
with cosmology can, of course, also apply to any strictly closed sub-system of the
universe provided its initial density matrix is known. However, strictly closed sub-
systems of any size are not easily realized in the universe. Even slight interactions,
such as those of a planet with the cosmic background radiation, can be important
for the quantum mechanics of a system, as we shall see. Further, it would be ex-
traordinarily difficult to prepare precisely the initial density matrix of any sizeable
system so as to get rid of the dependence on the density matrix of the universe. In
fact, even those large systems that are approximately isolated today inherit many
important features of their effective density matrix from the initial condition of the
universe.
(B) HISTORIES
The three forms of information necessary for prediction in quantum cosmology
are represented in the Heisenberg picture as followsM: The quantum state of the
universe is described by a density matrix p. Observables describing specific infor-
mation are represented by operators 0(t). For simplicity, but without loss of gen-
erality, we shall focus on non-"fuzzy", "yes-no" observables. These are represented
in the Heisenberg picture by projection operators P(t). The Hamiltonian, which
is the remaining form of information, describes evolution by relating the operators
corresponding to the same question at different times through
P(t) = eillt/h p(0)e —illt/11 (3)

An exhaustive set of "yes-no" alternatives at one time is represented in the
Heisenberg picture by sets of projection operators (Pt (t) , PZ (t), • • -). In Pk (t), k
MSee, e.g., J. B. Hartle 28,29,32,31,34 For a concise discussion see M. GeB-Man.'s

(*The utility of this Heisenberg picture formulation of quantum mechanics has been stressed by
many authors, among them E. Wigner,55 Y. Aharonov et al.,1 W. Unruh,52 and M. Cell-Mann."'
432 Murray GeII-Mann and James B. Hartle
labels the set, a the particular alternative, and t its time. A exhaustive set of
exclusive alternatives satisfies
E n(t) .1, bon . (4)
For example, one such exhaustive set would specify whether a field at a point on a
surface of constant t is in one or another of a set of ranges exhausting all possible
values. The projections are simply the projections onto eigenstates of the field at
that point with values in these ranges. We should emphasize that an exhaustive set
of projections need not involve a complete set of variables for the universe (one-
dimensional projections)—in fact, the projections we deal with as observers of the
universe typically involve only an infinitesimal fraction of a complete set.
Sets of alternative histories consist of time sequences of exhaustive sets of al-
ternatives. A history is a particular sequence of alternatives, abbreviated [Pa] =
(P11 (t1), Pg,2 (t2), • • • , Pan.(tn)). A completely fine-grained history is specified by giv-
ing the values of a complete set of operators at all times. One history is a coarse
graining of another if the set [Pa] of the first history consists of sums of the [Pa]
of the second history. The inverse relation is fine graining. The completely coarse-
grained history is one with no projections whatever, just the unit operator!
The reciprocal relationships of coarse and fine graining evidently constitute
only a partial ordering of sets of alternative histories. Two arbitrary sets need not
be related to each other by coarse/fine graining. The partial ordering is represented
schematically in Figure 2, where each point stands for a set of alternative histories.
Feynman's sum-over-histories formulation of quantum mechanics begins by
specifying the amplitude for a completely fine-grained history in a particular basis
of generalized coordinates Ql(t), say all fundamental field variables at all points in
space. This amplitude is proportional to
exP(iS[Qi(i)]/h), (5)
where S is the action functional that yields the Hamiltonian, H. When we employ
this formulation of quantum mechanics, we shall introduce the simplification of
ignoring fields with spins higher than zero, so as to avoid the complications of gauge
groups and of fermion fields (for which it is inappropriate to discuss eigenstates of
the field variables.) The operators Q4(t) are thus various scalar fields at different
points of space.
Let us now specialize our discussion of histories to the generalized coordinate
bases Cr (t) of the Feynman approach. Later we shall discuss the necessary general-
ization to the case of an arbitrary basis at each time t, utilizing quantum-mechanical
tranformation theory.
Completely fine-grained histories in the coordinate basis cannot be assigned
probabilities; only suitable coarse-grained histories can. There are at least three
common types of coarse graining: (1) specifying observables not at all times, but
6p
FIGURE 2 The schematic structure of the space of sets of possible histories for the
universe. Each dot in this diagram represents an exhaustive set of alternative histories.
Such sets, denoted by UN) in the text, correspond in the Heisenberg picture to time
sequences (Pl1 (t1), P3,(i2),• • • , (in )) of sets of projection operators, such that at
each time t k the alternatives ak are an orthogonal and exhaustive set of possibilities
for the universe. At the bottom of the diagram are the completely fine-grained sets
of histories, each arising from taking projections onto eigenstates of a complete set
of observables for the universe at every time. For example, the set Q is the set in
which all field variables at all points of space are specified at every time. P might be
the completely fine-grained set in which all field momenta are specified at each time.
1) might be a degenerate set of the kind discussed in Section VII in which the same
complete set of operators occurs at every time. But there are many other completely
fine-grained sets of histories corresponding to all possible combinations of complete
sets of observables that can be taken at every time.
The dots above the bottom row are coarse-grained sets of alternative histories.
If two dots are connected by a path, the one above is a coarse graining of the one
below—that is, the projections in the set above are sums of those in the set below.
At the very top is the degenerate case in which complete sums are taken at every
time, yielding no projections at all other than the unit operator! The space of sets of
alternative histories is thus partially ordered by the operation of coarse graining.
The heavy dots denote the decoherent sets of alternative histories. Coarse
grainings of decoherent sets remain decoherent. Maximal sets, the heavy dots
surrounded by circles, are those decohering sets for which there is no finer-grained
decoherent set.
only at some times: (2) specifying at any one time not a complete set of observables,
but only some of them: (3) specifying for these observables not precise values, but
only ranges of values. To illustrate all three, let us divide the Q' up into variables
z2 and Xs and consider only sets of ranges {AD of e at times t k , k = 1, • • n.
A set of alternatives at any one time consists of ranges Aka , which exhaust the
possible values of e as a ranges over all integers. An individual history is specified
by particular An's at the times t1 , • • • , tn . We write [la] = , • • • , Wan ) for a
particular history. A set of alternative histories is obtained by letting al • - • an
range over all values.
Let us use the same notation [A„] for the most general history that is a coarse
graining of the completely fine-grained history in the coordinate basis, specified by
ranges of the Q. at each time, including the possibility of full ranges at certain
times, which eliminate those times from consideration.
(C) DECOHERING HISTORIES

The important theoretical construct for giving the rule that determines whether
probabilities may be assigned to a given set of alternative histories, and what these
probabilities are, is the decoherence functional D [(history)', (history)]. This is
a complex functional on any pair of histories in the set. It is most transparently
defined in the sum-over-histories framework for completely fine-grained history seg-
ments between an initial time to and a final time tf , as follows:
DEQ'i ( t), (t)] = b(QI — 0) exp{i (t)] — S[Qi (t)]) 1 h} p(Q1 , Qio ) . (6)
Here p is the initial density matrix of the universe in the Qi representation, Qg

and 470 are the initial values of the complete set of variables, and Q7 and Qsf are
the final values. The decoherence functional for coarse-grained histories is obtained
from Eq. (6) according to the principle of superposition by summing over all that
is not specified by the coarse graining. Thus,
SQ' 8
D ([Aa'], [Acr1) = bQ (Q1 — e1{(5[421-5"14} P(Q1Oi Qio) • (7 )
J
[As'] IAA
More precisely, the integral is as follows (Figure 3): It is over all histories Q'i(t),
Q' (t) that begin at Qg ,Qi0 respectively, pass through the ranges [Day] and [6,„]
respectively, and wind up at a common point 0 at any time t f > tn . It is completed
by integrating over Q, Qio , and Q .
The connection between coarse-grained histories and completely fine-grained
ones is transparent in the sum-over-histories formulation of quantum mechanics.
FIGURE 3 The sum-over-

histories construction of the
decoherence functional.
However, the sum-over-histories formulation does not allow us to consider directly

histories of the most general type. For the most general histories one needs to
exploit directly the transformation theory of quantum mechanics and for this the
Heisenberg picture is convenient. In the Heisenberg picture D can be written
D ([Pad,[P„J) = Tr [PL%(t.) • • • I:14 0002i (ti) • • • P:.(t.)1 . (8)
The projections in Eq. (8) are time ordered with the earliest on the inside. When
the P's are projections onto ranges At of values of the Q's, expressions (7) and (8)
agree. From the cyclic property of the trace it follows that D is always diagonal
in the final indices an and an. (We assume throughout that the P's are bounded
operators in Hilbert space dealing, for example, with projections onto ranges of the
Q's and not onto definite values of the Q's). Decoherence is thus an interesting
notion only for strings of P's that involve more than one time. Decoherence is
automatic for "histories" that consist of alternatives at but one time.
Progressive coarse graining may be seen in the sum-over-histories picture as
summing over those parts of the fine-grained histories not specified in the coarse-
grained one, according to the principle of superposition. In the Heisenberg picture,
Eq. (8), the three common forms of coarse graining discussed above can be repre-
sented as follows: Summing on both sides of D over all P's at a given time and
using Eq. (3) eliminates those P's completely. Summing over all possibilities for
certain variables at one time amounts to factoring the P's and eliminating one of
the factors by summing over it. Summing over ranges of values of a given variable
at a given time corresponds to replacing the P's for the partial ranges by one for
the total range. Thus, if [Pp] is a coarse graining of the set of histories {[Pa]}, we
write
D (iP0' b [75,0 = D ([Pail , [PA) • (9)

all PL all Pa
aot fixed by tjcii set fixed by ( Po I
In the most general case, we may think of the completely fine-grained limit as
obtained from the coordinate representation by arbitrary unitary transformations
at all times. All histories can be obtained by coarse-graining the various completely
fine-grained ones, and coarse graining in its most general form involves taking ar-
bitrary sums of P's, as discussed earlier. We may use Eq. (9) in the most general
case where [Pp] is a coarse graining of [Pa].
A set of coarse-grained alternative histories is said to decohere when the off-
diagonal elements of D are sufficiently small:
D ([P.,],[Po ]) 2:$ 0 , for any al ak • (10)
This is a generalization of the condition for the absence of interference in the two-
slit experiment (approximate equality of the two sides of Eq. (2)). It is a sufficient
(although not a necessary) condition for the validity of the purely diagonal formula
D ([ 75] , [75])
all
E
Pa, aot
D ( EP"[Pal ) •
fixed by 1 P0
The rule for when probabilities can be assigned to histories of the universe is
then this: To the extent that a set of alternative histories decoheres, probabili-
ties can be assigned to its individual members. The probabilities are the diagonal
elements of D. Thus,
PaPap = DaPab[Pap
= Tr [P:.(2„) • • • • Pc7„(in)] (12)
when the set decoheres. We will frequently write p(antn , • • • , al ti ) for these prob-
abilities, suppressing the labels of the sets.
The probabilities defined by Eq. (12) obey the rules of probability theory as a
consequence of decoherence. The principal requirement is that the probabilities be
additive on "disjoint sets of the sample space". For histories this gives the sum rule
([P-1
3]) E
all Pa, bet
P ([PaD (13)
fixed by (P
aI
These relate the probabilities for a set of histories to the probabilities for all coarser
grained sets that can be constructed from it. For example, the sum rule eliminating
all projections at only one time is
E ak
P(anin • • • , ak+lik+1/ akik, ak—iik-1, • • • , alit)
Pt- kanin,• • • , ak+iik+i, • • • , aiii) • (14)
These rules follow trivially from Eqs. (11) and (12). The other requirements from
probability theory are that the probability of the whole sample space be unity, an
easy consequence of Eq. (11) when complete coarse graining is performed, and that
the probability for an empty set be zero, which means simply that the probability
of any sequence containing a projection P = 0 must vanish, as it does.
The p([Pa]) are approximate probabilities for histories, in the sense of Section
II, up to the standard set by decoherence. Conversely, if a given standard for the
probabilities is required by their use, it can be met by coarse graining until Eqs.
(10) and (13) are satisfied at the requisite level.
Further coarse graining of a decoherent set of alternative histories produces
another set of decoherent histories since the probability sum rules continue to be
obeyed. That is illustrated in Figure 2, which makes it clear that in a progression
from the trivial completely coarse graining to a completely fine graining, there are
sets of histories where further fine graining always results in loss of decoherence.
These are the maximal sets of alternative decohering histories.
These rules for probability exhibit another important feature: The operators
in Eq. (12) are time-ordered. Were they not time-ordered (zig-zags) we could have
assigned non-zero probabilities to conflicting alternatives at the same time. The
time ordering thus expresses causality in quantum mechanics, a notion that is ap-
propriate here because of the approximation of fixed background spacetime. The
time ordering is related as well to the "arrow of time" in quantum mechanics, which
we discuss below.
Given this discussion, the fundamental formula of quantum mechanics may be
reasonably taken to be
D ([Pal, [Pa]) ^ balai • • • 60,„a.P([Pal) (15)
for all [P.] in a set of alternative histories. Vanishing of the off-diagonal elements of
D gives the rule for when probabilities may be consistently assigned. The diagonal
elements give their values.
We could have used a weaker condition than Eq. (10) as the definition of de-
coherence, namely the necessary condition for the validity of the sum rules (11) of
probability theory:
D ([Pal, [Pa']) + D ([Pal, [Pal) 0 (16)
for any a'k # ak, or equivalently
Re {D ([Pa], [Pa'])} .••••••...0 . (17)

This is the condition used by Griffiths22 as the requirement for "consistent histo-
ries". However, while, as we shall see, it is easy to identify physical situations in
which the off-diagonal elements of D approximately vanish as the result of coarse
graining, it is hard to think of a general mechanism that suppresses only their real
parts. In the usual analysis of measurement, the off-diagonal parts of D approx-
imately vanish. We shall, therefore, explore the stronger condition of Eq. (10) in
what follows. That difference should not obscure the fact that in this part of our
work we have reproduced what is essentially the approach of Griffiths,22 extended
by Omnes.46'47,48
(D) PREDICTION AND RETRODICTION

Decoherent sets of histories are what we may discuss in quantum mechanics, for
they may be assigned probabilities. Decoherence thus generalizes and replaces the
notion of "measurement", which served this role in the Copenhagen interpretations.
Decoherence is a more precise, more objective, more observer-independent idea. For
example, if their associated histories decohere, we may assign probabilities to vari-
ous values of reasonable scale density fluctuations in the early universe whether or
not anything like a "measurement" was carried out on them and certainly whether
or not there was an "observer" to do it. We shall return to a specific discussion of
typical measurement situations in Section XI.
The joint probabilities p(antn, • - • , ctiti) for the individual histories in a de-
cohering set are the raw material for prediction and retrodiction in quantum cos-
mology. From them, the relevant conditional probabilities may be computed. The
conditional probability of, one subset {aiti}, given the rest {aiti}, is
p(taitai{aiii}) P(antn,• • • , aitz)

(18)
p({aiti))
For example, the probability for predicting alternatives ak+i , • • • , an, given that the
alternatives ai • • • ak have already happened, is
al
kantn, ' • • , ak+Itk+Ilakik, • • • , alt1) = 2:7(cvnin' • • t') (19)
p(akt k ,• • • ,ai t i ) .
The probability that an-i , • • • , al happened in the past, given present data sum-
marized by an alternative an at the present time in , is
P(Ctlan • • • al ti )
Plan-itn-1 , • • • , aiti 'cyan) (20)
p(antn )
Decoherence ensures that the probabilities defined by Eqs. (18)-(20) will approxi-
mately add to unity when summed over all remaining alternatives, because of Eq.
(14).
Despite the similarity between Eqs. (19) and (20), there are differences between
prediction and retrodiction. Future predictions can all be obtained from an effective
density matrix summarizing information about what has happened. If peff is defined
by
Pt(tk)- • • PcitiMPP1,(t1) • • • Pt (t k ) (21)
Peff Tr[Pet (4) • • • P2-,i (ti)pP2i i (ti) • • • Pt (tk)]
then
Xani n •••7 ak-i-iik+i lakik, • • • 7 alt1 )

= Tr[P:„(in) • • • PZ:,(tk+OPeffer.kf+1,(ik+i)''' P:n (tn )]. (22)
By contrast, there is no effective density matrix representing present information

from which probabilities for the past can be derived. As Eq. (20) shows, history
requires knowledge of both present data and the initial condition of the universe.
Prediction and retrodiction differ in another way. Because of the cyclic property
of the trace in Eq. (8), any final alternative decoheres and a probability can be
predicted for it. By contrast we expect only certain variables to decohere in the
past, appropriate to present data and the initial p. As the alternative histories of the
electron in the two-slit experiment illustrate, there are many kinds of alternatives
in the past for which the assignment of probabilities is prohibited in quantum
mechanics. For those sets of alternatives that do decohere, the decoherence and
the assigned probabilities typically will be approximate in the sense of Section II.
It is unlikely, for example, that the initial state of the universe is such that the
interference is exactly zero between two past positions of the sun in the sky.
These differences between prediction and retrodiction are aspects of the arrow
of time in quantum mechanics. Mathematically they are consequences of the time
ordering in Eq. (8) or (12). This time ordering does not mean that quantum me-
chanics singles out an absolute direction in time. Field theory is invariant under
CPT. Performing a CPT transformation on Eq. (8) or (12) results in an equiva-
lent expression in which the CPT-transformed p is assigned to the far future and
the CPT-transformed projections are anti-time-ordered. Either time ordering can,
therefore, be usedful; the important point is that there is a knowable Heisenberg
p from which probabilities can be predicted. It is by convention that we think of
it as an "initial condition", with the projections in increasing time order from the
inside out in Eqs. (8) and (12).
While the formalism of quantum mechanics allows the universe to be discussed
with either time ordering, the physics of the universe is time asymmetric, with a
tutt has been suggested28.29,31,32 that, for application to highly quantum-mechanical spacetime,
as in the very early universe, quantum mechanics should be generalized to yield a framework in
which both time orderings are treated simultaneously in the sum-over-histories approach. This
involves including both exp(iS) and exp(—iS) for each history and has as a consequence an
evolution equation (the Wheeler-DeWitt equation) that is second order in the time variable. The
suggestion is that the two time orderings decohere when the universe is large and spacetime
classical, so that the usual framework with just one ordering is recovered.
simple condition in what we call "the past". For example, the indicated present
homogeneity of the thermodynamic arrow of time can be traced to the near homo-
geneity of the "early" universe implied by p and the implication that the progenitors
of approximately isolated subsystems started out far from equilibrium at "early"
times.
Much has been made of the updating of the fundamental probability formula in
Eq. (19) and in Eqs. (21) and (22). By utilizing Eq. (21) the process of prediction
may be organized so that for each time there is a pj from which probabilities
for the future may be calculated. The action of each projection, P, on both sides
of p in Eq. (21) along with the division by the appropriate normalizing factor is
then sometimes called the "reduction of the wave packet". But this updating of
probabilities is no different from the classical reassessment of probabilities that
occurs after new information is obtained. In a sequence of horse races, the joint
probability for the winners of eight races is converted, after the winners of the
first three are known, into a reassessed probability for the remaining five races by
exactly this process. The main thing is that, because of decoherence, the sum rules
for probabilities are obeyed; once that is true, reassessment of probabilities is trivial.
The only non-trivial aspect of the situation is the choice of the string of P's in
Eq. (8) giving a decoherent set of histories.
(E) BRANCHES (ILLUSTRATED BY A PURE p)

Decohering sets of alternative histories give a definite meaning to Everett's "branch-
es". For a given such set of histories, the exhaustive set of PakR at each time tk
corresponds to a branching.
To illustrate this even more explicitly, consider an initial density matrix that
is a pure state, as in typical proposals for the wave function of the universe:
P= (23)
The initial state may be decomposed according to the projection operators that
define the set of alternative histories
IT)= E PZ,(tn) P«,(ti)1 4 )

i•-•a.
(24)
(Vi•••42.
The states 1[P,], W) are approximately orthogonal as a consequence of their deco-

herence
(1Pce], •c--$ 0, for any dk # ak (25)
Eq. (25) is just a reexpression of Eq. (10), given Eq. (23).
When the initial density matrix is pure, it is easily seen that some coarse
graining in the present is always needed to achieve decoherence in the past. If the
Pin(tn ) for the last time to in Eq. (8) were all projections onto pure states, D would
factor for a pure p and could never satisfy Eq. (10), except for certain special kinds of
histories described near the end of Section VII, in which decoherence is automatic,
independent of p. Similarly, it is not difficult to show that some coarse graining is
required at any time in order to have decoherence of previous alternatives, with the
same set of exceptions.
After normalization, the states 1[P0], IF) represent the individual histories or
individual branches in the decohering set. We may, as for the effective density
matrix of IV(D), summarize present information for prediction just by giving one
of these states, with projections up to the present.
(F) SETS OF HISTORIES WITH THE SAME PROBABILITIES

If the projections P are not restricted to a particular class (such as projections
onto ranges of Qi variables), so that coarse-grained histories consist of arbitrary
exhaustive families of projections operators, then the problem of exhibiting the
decohering sets of strings of projections arising from a given p is a purely algebraic
one. Assume, for example, that the initial condition is known to be a pure state as
in Eq. (23). The problem of finding ordered strings of exhaustive sets of projections
[P,] so that the histories Pg. • • • decohere according to Eq. (25) is purely
algebraic and involves just subspaces of Hilbert space. The problem is the same for
one vector IV) as for any other. Indeed, using subspaces that are exactly orthogonal,
we may identify sequences that exactly decohere.
However, it is clear that the solution of the mathematical problem of enumerat-
ing the sets of decohering histories of a given Hilbert space has no physical content
by itself. No description of the histories has been given. No reference has been made
to a theory of the fundamental interactions. No distinction has been made between
one vector in Hilbert space as a theory of the initial condition and any other. The
resulting probabilities, which can be calculated, are merely abstract numbers.
We obtain a description of the sets of alternative histories of the universe when
the operators corresponding to the fundamental fields are identified. We make con-
tact with the theory of the fundamental interactions if the evolution of these fields
is given by a fundamental Hamiltonian. Different initial vectors in Hilbert space
will then give rise to decohering sets having different descriptions in terms of the
fundamental fields. The probabilities acquire physical meaning.
Two different simple operations allow us to construct from one set of histories
another set with a different description but the same probabilities. First consider
unitary transformations of the P's that are constant in time and leave the initial p
fixed
p=Up11-1 , (26)
PRO = U PROU . (27)
If p is pure there will be very many such transformations; the Hilbert space is
large and only a single vector is fixed. The sets of histories made up from the
fit } will have an identical decoherence functional to the sets constructed from the
corresponding {PD. If one set decoheres, the other will, and the probabilities for
the individual histories will be the same.
In a similar way, decoherence and probabilities are invariant under arbitrary
reassignments of the times in a string of P's (as long as they continue to be ordered),
with the projection operators at the altered times unchanged as operators in Hilbert
space. This is because in the Heisenberg picture every projection is at any time a
projection operator for some quantity.
The histories arising from constant unitary transformations or from reassign-
ment of times of a given set of P's will, in general, have very different descriptions
in terms of fundamental fields from that of the original set. We are considering
transformations such as Eq. (27) in an active (or alibi) sense so that the field op-
erators and Hamiltonian are unchanged. (The passive (or alias) transformations,
in which these are transformed, are easily understood.) A set of projections onto
the ranges of field values in a spatial region is generally transformed by Eq. (27) or
by any reassignment of the times into an extraordinarily complicated combination
of all fields and all momenta at all positions in the universe! Histories consisting
of projections onto values of similar quantities at different times can thus become
histories of very different quantities at various other times.
In ordinary presentations of quantum mechanics, two histories with different
descriptions can correspond to physically distinct situations because it is presumed
that various different Hermitian combinations of field operators are potentially mea-
surable by different kinds of external apparatus. In quantum cosmology, however,
apparatus and system are considered together and the notion of physically distinct
situations may have a different character.
V. THE ORIGINS OF DECOHERENCE

What are the features of coarse-grained sets of histories that decohere, given the p
and H of the universe? In seeking to answer this question it is important to keep in
mind the basic aspects of the theoretical framework on which decoherence depends.
Decoherence of a set of alternative histories is not a property of their operators
alone. It depends on the relations of those operators to the density matrix p. Given
p, we could, in principle, compute which sets of alternative histories decohere.
We are not likely to carry out a computation of all decohering sets of alternative
histories for the universe, described in terms of the fundamental fields, anytime
in the near future, if ever. However, if we focus attention on coarse grainings of
particular variables, we can exhibit widely occurring mechanisms by which they
decohere in the presence of the actual p of the universe. We have mentioned in
Section IV(C) that decoherence is automatic if the projection operators P refer only
to one time; the same would be true even for different times if all the P's commuted
with one another. Of course, in cases of interest, each P typically factors into
commuting projection operators, and the factors of P's for different times often fail
to commute with one another, for example, factors that are projections onto related
ranges of values of the same Heisenberg operator at different times. However, these
non-commuting factors may be correlated, given p, with other projection factors
that do commute or, at least, effectively commute inside the trace with the density
matrix p in Eq. (8) for the decoherence functional. In fact, these other projection
factors may commute with all the subsequent P's and thus allow themselves to be
moved to the outside of the trace formula. When all the non-commuting factors are
correlated in this manner with effectively commuting ones, then the off-diagonal
terms in the decoherence functional vanish, in other words, decoherence results. Of
course, all this behavior may be approximate, resulting in approximate decoherence.
This type of situation is fundamental in the interpretation of quantum me-
chanics. Non-commuting quantities, say at different times, may be correlated with
commuting or effectively commuting quantities because of the character of p and
H, and thus produce decoherence of strings of P's despite their non-commutation.
For a pure p, for example, the behavior of the effectively commuting variables leads
to the orthogonality of the branches of the state IIP), as defined in Eq. (24). We
shall see that correlations of this character are central to understanding historical
records (Section X) and measurement situations (Section XI).
As an example of decoherence produced by this mechanism, consider a coarse-
grained set of histories defined by time sequences of alternative approximate local-
izations of a massive body such as a planet or even a typical interstellar dust grain.
As shown by Joos and Zeh,37 even if the successive localizations are spaced as closely
as a nanosecond, such histories decohere as a consequence of scattering by the 3°
cosmic background radiation (if for no other reason). Different positions become
correlated with nearly orthogonal states of the photons. More importantly, each al-
ternative sequence of positions becomes correlated with a different orthogonal state
of the photons at the final time. This accomplishes the decoherence and we may
loosely say that such histories of the position of a massive body are "decohered"
by interaction with the photons of the background radiation.
Other specific models of decoherence have been discussed by many authors,
among them Joos and Zeh,37 Caldeira and Leggett,3 and Zurek." Typically these
discussions have focussed on a coarse graining that involves only certain variables
analogous to the position variables above. Thus the emphasis is on particular non-
commuting factors of the projection operators and not on correlated operators that
may be accomplishing the approximate decoherence. Such coarse grainings do not,
in general, yield the most refined approximately decohering sets of histories, since
one could include projections onto ranges of values of the correlated operators
without losing the decoherence.
The simplest model consists of a single oscillator interacting bilinearly with a
large number of others, and a coarse graining which involves only the coordinates of
the special oscillator. Let x be the coordinate of the special oscillator, M its mass,
WR its frequency renormalized by its interactions with the others, and Sfree its free
action. Consider the special case where the density matrix of the whole system,
referred to an initial time, factors into the product of a density matrix p(x', x) of
444 Murray Gall-Mann and James B. Hartle
the distinguished oscillator and another for the rest. Then, generalizing slightly a
treatment of Feynman and Vernon,12 we can write D defined by Eq. (7) as
D ([AA, [6.0 di' (t) f A.1 x(t)5(xli — x1)

3 IAA
exp i(Sfree[xi(t)] — Sfree[x(t)] + Wki (t), x(t)]) /h} P(4, xo) , (28)
the intervals [0,,] referring here only to the variables of the distinguished oscillator.
The sum over the rest of the oscillators has been carried out and is summarized by
the Feynman-Vernon influence functional exp(iW [x' (t), x(t)]). The remaining sum
over x'(t) and x(t) is as in Eq. (7).
The case when the other oscillators are in an initial thermal distribution has
been extensively investigated by Caldeira and Leggett.3 In the simple limit of a
uniform continuum of oscillators cut off at frequency 11 and in the Fokker-Planck
limit of kT >> hfi >> tuaR, they find
W[x' (t), x(t)] = — M7 dt[x — xi + x' —

J
.2M-rkT div (i) _ x ( or ,
+2 (29)
where 7 summarizes the interaction strengths of the distinguished oscillator with

its environment. The real part of W contributes dissipation to the equations of mo-
tion. The imaginary part squeezes the trajectories x(t) and e(t) together, thereby
providing approximate decoherence. Very roughly, primed and unprimed position
intervals differing by distances d on opposite sides of the trace in Eq. (28) will
decohere when spaced in time by intervals
2
(30)
[(1TfIcT)
2/ 1
As stressed by Zurek,6° for typical macroscopic parameters this minimum time
for decoherence can be many orders of magnitude smaller than a characteristic
dynamical time, say the damping time 1/7. (The ratio is around 10-43 for M •••••
gm, T 300°K, d •••••• cm!)
The behavior of a coarse-grained set of alternative histories based on projec-
tions, at times spaced far enough apart for decoherence, onto ranges of values of
x alone, is then roughly classical in that the successive ranges of positions follow
roughly classical orbits, but with the pattern of classical correlation disturbed by
various effects, especially (a) the effect of quantum spreading of the x-coordinate,
(b) the effect of quantum fluctuations of the other oscillators, and (c) classical
statistical fluctuations, which are lumped with (b) when we use the fundamental
formula. We see that the larger the mass M, the shorter the decoherence time and
the more the x-coordinate resists the various challenges to its classical behavior.
What the above models convincingly show is that decoherence will be wide-
spread in the universe for certain familiar "classical" variables. The answer to
Fermi's question to one of us of why we don't see Mars spread out in a quan-
tum superposition of different positions in its orbit is that such a superposition
would rapidly decohere. We now proceed to a more detailed discussion of such
decoherence.
VI. QUASICLASSICAL DOMAINS

As observers of the universe, we deal with coarse grainings that are appropriate
to our limited sensory perceptions, extended by instruments, communication, and
records, but in the end characterized by a great amount of ignorance. Yet we have
the impression that the universe exhibits a finer-grained set of decohering histories,
independent of us, defining a sort of "classical domain", governed largely by classical
laws, to which our senses are adapted while dealing with only a small part of it.
No such coarse graining is determined by pure quantum theory alone. Rather, like
decoherence, the existence of a quasiclassical domain in the universe must be a
consequence of its initial condition and the Hamiltonian describing its evolution.
Roughly speaking, a quasiclassical domain should be a set of alternative deco-
hering histories, maximally refined consistent with decoherence, with its individual
histories exhibiting as much as possible patterns of classical correlation in time.
Such histories cannot be exactly correlated in time according to classical laws be-
cause sometimes their classical evolution is disturbed by quantum events. There
are no classical domains, only quasiclassical ones.
We wish to make the question of the existence of one or more quasiclassical
domains into a calculable question in quantum cosmology and for this we need
criteria to measure how close a set of histories comes to constituting a "classical
domain". We have not solved this problem to our satisfaction, but, in the next few
sections, we discuss some ideas that may contribute toward its solution.
VII. MAXIMAL SETS OF DECOHERING HISTORIES

Decoherence results from coarse graining. As described in Section IV(B) and Figure
2, coarse grainings can be put into a partial ordering with one another. A set of
alternative histories is a coarse graining of a finer set if all the exhaustive sets of
n
projections {}making up the coarser set of histories are obtained by partial
sums over the projections making up the finer set of histories.
Maximal sets of alternative decohering histories are those for which there are
no finer-grained sets that are decoherent. It is desirable to work with maximal
sets of decohering alternative histories because they are not limited by the sensory
capacity of any set of observers—they can cover phenomena in all parts of the
universe and at all epochs that could be observed, whether or not any observer was
present. Maximal sets are the most refined descriptions of the universe that may
be assigned probabilities in quantum mechanics.
The class of maximal sets possible for the universe depends, of course, on the
completely fine-grained histories that are presented by the actual quantum theory
of the universe. If we utilize to the full, at each moment of time, all the projections
permitted by transformation theory, which gives quantum mechanics its protean
character, then there is an infinite variety of completely fine-grained sets, as illus-
trated in Figure 2. However, were there some fundamental reason to restrict the
completely fine grained sets, as would be the case if sum-over-histories quantum
mechanics were fundamental, then the class of maximal sets would be smaller as
illustrated in Figure 4. We shall proceed as if all fine grainings are allowed.
If a full correlation exists between a projection in a coarse graining and an-
other projection not included, then the finer graining including both still defines
a decoherent set of histories. In a maximal set of decoherent histories, both corre-
lated projections must be included if either one is included. Thus, in the mechanism
FIGURE 4 If the completely

fine-grained histories arise
from a single complete set of
observables, say the set Q of
field variables Q' at each point
in space and every time, then
the possible coarse-grained
histories will be a subset of
those illustrated in Figure
2. Maximal sets can still be
defined but will, in general,
differ from those of Figure 2.
of decoherence discussed in Section V, projections onto the correlated orthogonal

states of the 3°K photons are included in the maximal set of decohering histories
along with the positions of the massive bodies. Any projections defining historical
records such as we shall describe in Section X; or values of measured quantities
such as we shall describe in Section XI, must similarly be included in a maximal
set.
More information about the initial p and H is contained in the probabilities of
a finer-grained set of histories than in those of a coarser-grained set. It would be
desirable to have a quantitative measure of how much more information is obtained
in a further fine graining of a coarse-grained set of alternative histories. Such a
quantity would then measure how much closer a decoherent fine graining comes to
maximality in a physically relevant sense.
We shall discuss a quantity that, while not really a measure of maximality,
is useful in exploring some aspects of it. In order to construct that quantity, the
usual entropy formula is applied to sets of alternative decohering histories of the
universe, rather than, as more usually, alternatives at a single time. We make use
of the coarse-grained density matrix defined using the methods of Jaynes,[111 but
generalized to take account of the density matrix of the universe and applied to the
probabilities for histories. The density matrix p is constructed by maximizing the
entropy functional
S(A) = —Tr(ip log j5) (31)
over all density matrices that satisfy the constraints ensuring that each
Tr [P:,.(tn ) • • • P,11 (t )pP«i(ti) • • • PZ, (tn)] (32)
has the same value it would have had when computed with the density matrix of
the universe, p, for a given set of coarse-grained histories. The density matrix I) thus
reproduces the decoherence functional for this set of histories, and in particular their
probabilities, but possesses as little information as possible beyond those properties.
A fine graining of a set of alternative histories leads to more conditions on p of
the form (32) than in the coarser-grained set. In nontrivial cases SOO is, therefore,
lowered and 5 becomes closer to p.
If the insertion of apparently new P's into a chain is redundant, then S(p) will
not be lowered. A simple example will help to illustrate this: Consider the set of
histories consisting of projections PZ(t,n) which project onto an orthonormal basis
for Hilbert space at one time, in,. Trivial further decoherent fine grainings can be
constructed as follows: At each other time tk introduce a set of projections It, (ti)
that, through the equations of motion, are identical operators in Hilbert space to
the set PZ(t,n). In this way, even though we are going through the motions of
introducing a completely fine-grained set of histories covering all the times, we are
really just repeating the projections PZ(tm) over and over again. We thus have a
completely fine-grained set of histories that, in fact, consists of just one fine-grained
(11]see, e.g., the papers reprinted in RosenJuantz51 or Hobson.36

set of projections and decoheres exactly because there is only one such set. Indeed,
in terms of S(p) it is no closer to maximality than the set consisting of P (t,„) at
one time. The quantity SUS) thus serves to identify such trivial refinements, which
amount to redundancy in the conditions (32).
We can generalize the example in an interesting way by constructing the special
kinds of histories mentioned after Eq. (25). We take tm to be the final time and then
adjoin, at earlier and earlier times, a succession of progressive coarse grainings of
the set {P (t,,,)}. Thus, as time moves forward, the only projections are finer and
finer grainings terminating in the one-dimensional PZ, (tm). We thus have again a
set of histories in which decoherence is automatic, independent of the character of
p, and for which S(A) has the same value it would have had if only conditions at
the final time had been considered.
In a certain sense, S(p) for histories can be regarded as decreasing with time.
If we consider S(p) for a string of alternative projections up to a certain time tn ,
as in Eq. (32), and then adjoin an additional set of projections for a later time, the
number of conditions on A is increased and thus the value of S(//) is decreased (or,
in trivial cases, unchanged). That is natural, since S(15) is connected with the lack
of information contained in a set of histories and that information increases with
non-trivial fine graining of the histories, no matter what the times for which the
new P's are introduced. (In some related problems, a quantity like S that keeps
decreasing as a result of adjoining projections at later times can be converted into
an increasing quantity by adding an algorithmic complexity term.61)
The quantity S(P) is closely related to other fundamental quantities in physics.
One can show, for example, that when used with the per representing present data
and with alternatives at a single time, these techniques give a unified and generalized
treatment of the variety of coarse grainings commonly introduced in statistical
mechanics; and, as Jaynes and others have pointed out, the resulting 50*s are
the physical entropies of statistical mechanics. Here, however, these techniques are
applied to time histories and the initial condition is utilized. The quantity SCP) is
also related to the notion of thermodynamic depth currently being investigated by
Lloyd.43
VIII. CLASSICITY
Some maximal sets will be more nearly classical than others. The more nearly
classical sets of histories will contain projections (onto related ranges of values) of
operators, for different times, that are connected to one another by the unitary
transformations e-111(4") and that are correlated for the most part along classical
paths, with probabilities near zero and one for the successive projections. This
pattern of classical correlation may be disturbed by the inclusion, in the maximal
set of projection operators, of other variables, which do not behave in this way
(as in measurement situations to be described later). The pattern may also be
disturbed by quantum spreading and by quantum and classical fluctuations, as

described in connection with the oscillator example treated in Section V. Thus
we can, at best, deal with quasiclassical maximal sets of alternative decohering
histories, with trajectories that split and fan out as a result of the processes that
make the decoherence possible. As we stressed earlier, there are no classical domains,
only quasiclassical ones.
The impression that there is something like a classical domain suggests that we
try to define quasiclassical domains precisely by searching for a measure of classicity
for each of the maximal sets of alternative decohering histories and concentrating on
the one (or ones) with maximal classicity. Such a measure would be applied to the
elements of D and the corresponding coarse graining. It should favor predictability,
involving patterns of classical correlation as described above. It should also favor
maximal sets of alternative decohering histories that are relatively fine-grained as
opposed to those which had to be carried to very coarse graining before they would
give decoherence. We are searching for such a measure. It should provide a precise
and quantitative meaning to the notion of quasiclassical domain.
IX. QUASICLASSICAL OPERATORS

What are the projection operators that specify the coarse graining of a maximal set
of alternative histories with high classicity, which defines a quasiclassical domain?
They will include, as mentioned above, projections onto comparable ranges of values
of certain operators at sequences of times, obeying roughly classical equations of
motion, subject to fluctuations that cause their trajectories to fan out from time to
time. We can refer to these operators, which habitually decohere, as "quasiclassical
operators" . What these quasi-classical operators are, and how many of them there
are, depends not only on H and p, but also on the epoch, on the spatial region, and
on previous branchings.
We can understand the origin of at least some quasiclassical operators in reason-
ably general terms as follows: In the earliest instants of the universe the operators
defining spacetime on scales well above the Planck scale emerge from the quantum
fog as quasiclassical.D4 Any theory of the initial condition that does not imply
this is simply inconsistent with observation in a manifest way. The background
spacetime thus defined obeys the Einstein equation. Then, where there are suitable
conditions of low temperature, etc., various sorts of hydrodynamic variables may
emerge as quasiclassical operators. These are integrals over suitable small volumes
of densities of conserved or nearly conserved quantities. Examples are densities of
energy, momentum, baryon number, and, in later epochs, nuclei, and even chemical
Pi See, e.g., E. Joos,38 H. Zell," C. Keifer,so J. Haniwen,25 and T. Fukuyama and M. Morilcawa.18
450 • Murray GeII-Mann and James B. Hartle
species. The sizes of the volumes are limited above by maximality and are limited
below by classicity because they require sufficient "inertia" to enable them to resist
deviations from predictability caused by their interactions with one another, by
quantum spreading, and by the quantum and statistical fluctuations resulting from
interactions with the rest of the universe. Suitable integrals of densities of approx-
imately conserved quantities are thus candidates for habitually decohering quasi-
classical operators. Field theory is local, and it is an interesting question whether
that locality somehow picks out local densities as the source of habitually decoher-
ing quantities. It is hardly necessary to note that such hydrodynamic variables are
among the principal variables of classical physics 1131
In the case of densities of conserved quantities, the integrals would not change
at all if the volumes were infinite. For smaller volumes we expect approximate per-
sistence. When, as in hydrodynamics, the rates of change of the integrals form part
of an approximately closed system of equations of motion, the resulting evolution
is just as classical as in the case of persistence.
X. BRANCH DEPENDENCE
As the discussion in Sections V and IX shows, physically interesting mechanisms
for decoherence will operate differently in different alternative histories for the uni-
verse. For example, hydrodynamic variables defined by a relatively small set of
volumes may decohere at certain locations in spacetime in those branches where a
gravitationally condensed body (e.g., the earth) actually exists, and may not deco-
here in other branches where no such condensed body exists at that location. In the
latter branch there simply may be not enough "inertia" for densities defined with
too small volumes to resist deviations from predictability. Similarly, alternative spin
directions associated with Stern-Gerlach beams may decohere for those branches
on which a photographic plate detects their beams and not in a branch where they
recombine coherently instead. There are no variables that are expected to decohere
universally. Even the mechanisms causing spacetime geometry at a given location
to decohere on scales far above the Planck length cannot necessarily be expected
to operate in the same way on a branch where the location is the center of a black
hole as on those branches where there is no black hole nearby.
How is such "branch dependence" described in the formalism we have elabo-
rated? It is not described by considering histories where the set of alternatives at
one time (the k in a set of Pc!) depends on specific alternatives (the a's) of sets
of earlier times. Such dependence would destroy the derivation of the probability
sum rules from the fundamental formula. However, there is no such obstacle to the
set of alternatives at one time depending on the sets of alternatives at all previous
[131 For discussion of how such hydrodynamic variables are distinguished in non-equilibrium sta-
tistical mechanics in not unrelated ways see, e.g., L. Kadanoff and P. Martin,39 D. Forster,14 and
J. Lebovitz.42
times. It is by exploiting this possibility, together with the possibility of present

records of past events, that we can correctly describe the sense in which there is
branch dependence of decoherence, as we shall now discuss.
A record is a present alternative that is, with high probability, correlated with
an alternative in the past. The construction of the relevant probabilities was dis-
cussed in Section IV, including their dependence on the initial condition of the
universe (or at least on information that effectively bears on that initial condition).
The subject of history is most honestly described as the construction of probabilities
for the past, given such records. Even non-commuting alternatives such as a posi-
tion and its momentum at different, even nearby times may be stored in presently
commuting record variables.
The branch dependence of histories becomes explicit when sets of alternatives
are considered that include records of specific events in the past. To illustrate this,
consider the example above, where different sorts of hydrodynamic variables might
decohere or not depending on whether there was a gravitational condensation. The
set of alternatives that decohere must refer both to the records of the condensa-
tion and to hydrodynamic variables. Hydrodynamic variables with smaller volumes
would be part of the subset with the record that the condensation took place and
vice versa.
The branch dependence of decoherence provides the most direct argument
against the position that a classical domain should simply be defined in terms
of a certain set of variables (e.g., values of spacetime averages of the fields in the
classical action). There are unlikely to be any physically interesting variables that
decohere independent of circumstance.
XI. MEASUREMENT SITUATIONS

When a correlation exists between the ranges of values of two operators of a quasi-
classical domain, there is a measurement situation. From a knowledge of the value
of one, the value of the other can be deduced because they are correlated with
probability near unity. Any such correlation exists in some branches of the universe
and not in others; for example, measurements in a laboratory exist only in those
branches where the laboratory was actually constructed!
We use the term "measurement situation" rather than "measurement" for such
correlations to stress that nothing as sophisticated as an "observer" need be present
for them to exist. If there are many significantly different quasiclassical domains,
different measurement situations may be exhibited by each one.
When the correlation we are discussing is between the ranges of values of two
quasiclassical operators that habitually decohere, as discussed in Section IX, we
have a measurement situation of a familiar classical kind. However, besides the
quasiclassical operators, the highly classical maximal sets of alternative histories
of a quasiclassical domain may include other operators having ranges of values
strongly correlated with the quasiclassical ones at particular times. Such operators,
not normally decohering, are, in fact, included among the decohering set only by
virtue of their correlation with a habitually decohering one. In this case we have a
measurement situation of the kind usually discussed in quantum mechanics. Sup-
pose, for example, in the inevitable Stern-Gerlach experiment, that az of a spin-1/2
particle is correlated with the orbit of an atom in an inhomogeneous magnetic field.
If the two orbits decohere because of interaction with something else (the atomic
excitations in a photographic plate for example), then the spin direction will be
included in the maximal set of decoherent histories, fully correlated with the deco-
hering orbital directions. The spin direction is thus measured.
The recovery of the Copenhagen rule for when probabilities may be assigned
is immediate. Measured quantities are correlated with decohering histories. De-
cohering histories can be assigned probabilities. Thus in the two-slit experiment
(Figure 1), when the electron interacts with an apparatus that determines which
slit it passed through, it is the decoherence of the alternative configurations of the
apparatus that enables probabilities to be assigned for the electron.
Correlation between the ranges of values of operators of a quasiclassical domain
is the only defining property of a measurement situation. Conventionally, measure-
ments have been characterized in other ways. Essential features have been seen to
be irreversibility, amplification beyond a certain level of signal-to-noise, association
with a macroscopic variable, the possibility of further association with a long chain
of such variables, and the formation of enduring records. Efforts have been made
to attach some degree of precision to words like "irreversible", "macroscopic", and
"record", and to discuss what level of "amplification" needs to be achieved 1141 While
such characterizations of measurement are difficult to define preciselyP5) some can
be seen in a rough way to be consequences of the definition that we are attempting
to introduce here, as follows:
Correlation of a variable with the quasiclassical domain (actually, inclusion in
its set of histories) accomplishes the amplification beyond noise and the association
with a macroscopic variable that can be extended to an indefinitely long chain of
such variables. The relative predictability of the classical world is a generalized form
of record. The approximate constancy of, say, a mark in a notebook is just a special
case; persistence in a classical orbit is just as good.
Irreversibility is more subtle. One measure of it is the cost (in energy, money,
etc.) of tracking down the phases specifying coherence and restoring them. This is
intuitively large in many typical measurement situations. Another, related measure
[141For an interesting effort at precision see A. Daneri et al.6

i151An example of this occurs in the case of "null measurements" discussed by Renninger," Dicke,9
and others. An atom decays at the center of a spherical cavity. A detector which covers all but a
small opening in the sphere does not register. We conclude that we have measured the direction
of the decay photon to an accuracy set by the solid angle subtended by the opening. Certainly
there is an interaction of the electromagnetic field with the detector, but did the escaping photon
suffer an "irreversible act of amplification"? The point in the present approach is that the set of
alternatives, detected and not detected, exhibits decoherence because of the place of the detector
in the universe.
is the negative of the logarithm of the probability of doing so. If the probability of
restoring the phases in any particular measurement situation were significant, then
we would not have the necessary amount of decoherence. The correlation could
not be inside the set of decohering histories. Thus, this measure of irreversibility is
large. Indeed, in many circumstances where the phases are carried off to infinity or
lost in photons impossible to retrieve, the probability of recovering them is truly
zero and the situation perfectly irreversible—infinitely costly to reverse and with
zero probability for reversal!
Defining a measurement situation solely as the existence of correlations in a qua-
siclassical domain, if suitable general definitions of maximality and classicity can be
found, would have the advantages of clarity, economy, and generality. Measurement
situations occur throughout the universe and without the necessary intervention of
anything as sophisticated as an "observer". Thus, by this definition, the production
of fission tracks in mica deep in the earth by the decay of a uranium nucleus leads
to a measurement situation in a quasiclassical domain in which the tracks directions
decohere, whether or not these tracks are ever registered by an "observer".
XII. COMPLEX ADAPTIVE SYSTEMS

Our picture is of a universe that, as a consequence of a particular initial condition
and of the underlying Hamiltonian, exhibits at least one quasiclassical domain made
up of suitably defined maximal sets of alternative histories with as much classicity
as possible. The quasiclassical domains would then be a consequence of the theory
and its boundary condition, not an artifact of our construction. How do we then
characterize our place as a collectivity of observers in the universe?
Both singly and collectively we are examples of the general class of complex
adaptive systems. When they are considered within quantum mechanics as portions
of the universe, making observations, we refer to such complex adaptive systems
as information gathering and utilizing systems (IGUSes). The general characteri-
zation of complex adaptive systems is the subject of much ongoing research, which
we cannot discuss here. From a quantum-mechanical point of view the foremost
characteristic of an IGUS is that, in some form of approximation, however crude or
classical, it employs the fundamental formula, with what amounts to a rudimentary
theory of p, H, and quantum mechanics. Probabilities of interest to the IGUS in-
clude those for correlations between its memory and the external world. (Typically
these are assumed perfect; not always such a good approximation!) The approxi-
mate fundamental formula is used to compute probabilities on the basis of present
data, make predictions, control future perceptions on the basis of these predictions
(i.e., exhibit behavior), acquire further data, make further predictions, and so on.
To carry on in this way, an IGUS uses probabilities for histories referring both
to the future and the past. An IGUS uses decohering sets of alternative histories and
therefore performs further coarse graining on a quasiclassical domain. Naturally, its
coarse graining is very much coarser than that of the quasiclassical domain since it
utilizes only a few of the variables in the universe.
The reason such systems as IGUSes exist, functioning in such a fashion, is to
be sought in their evolution within the universe. It seems likely that they evolved to
make predictions because it is adaptive to do so.(161 The reason, therefore, for their
focus on decohering variables is that these are the only variables for which predic-
tions can be made. The reason for their focus on the histories of a quasiclassical
domain is that these present enough regularity over time to permit the generation
of models (schemata) with significant predictive power.
If there is essentially only one quasiclassical domain, then naturally the IGUS
utilizes further coarse grainings of it. If there are many essentially inequivalent
quasiclassical domains, then we could adopt a subjective point of view, as in some
traditional discussions of quantum mechanics, and say that the IGUS "chooses"
its coarse graining of histories and, therefore, "chooses" a particular quasiclassical
domain, or a subset of such domains, for further coarse graining. It would be better,
however, to say that the IGUS evolves to exploit a particular quasiclassical domain
or set of such domains. Then IGUSes, including human beings, occupy no special
place and play no preferred role in the laws of physics. They merely utilize the
probabilities presented by quantum mechanics in the context of a quasiclassical
domain.
XIII. CONCLUSIONS
We have sketched a program for understanding the quantum mechanics of the
universe and the quantum mechanics of the laboratory, in which the notion of
quasiclassical domain plays a central role. To carry out that program, it is important
to complete the definition of a quasiclassical domain by finding the general definition
for classicity. Once that is accomplished, the question of how many and what kinds
of essentially inequivalent quasiclassical domains follow from p and H becomes a
topic for serious theoretical research. So is the question of what are the general
properties of IGUSes that can exist in the universe exploiting various quasiclassical
domains, or the unique one if there is essentially only one.
It would be a striking and deeply important fact of the universe if, among its
maximal sets of decohering histories, there were one roughly equivalent group with
much higher classicities than all the others. That would then be the quasiclassical
domain, completely independent of any subjective criterion, and realized within
quantum mechanics by utilizing only the initial condition of the universe and the
Hamiltonian of the elementary particles.
[161 Perhaps, as W. Unruh has suggested, there are complex adaptive systems, making no use of
prediction, that can function in a highly quantum-mechanical way. If this is the case, they are
very different from anything we know or understand.
Whether the universe exhibits one or many maximal sets of branching alter-
native histories with high classicities, those quasiclassical domains are the possible
arenas of prediction in quantum mechanics. •
It might seem at first sight that in such a picture the complementarity of
quantum mechanics would be lost; in a given situation, for example, either a mo-
mentum or a coordinate could be measured, leading to different kinds of histories.
We believe that impression is illusory. The histories in which an observer, as part
of the universe, measures p and the histories in which that observer measures x are
decohering alternatives. The important point is that the decoherent histories of a
quasiclassical domain contain all possible choices that might be made by all possible
observers that might exist, now, in the past, or in the future for that domain.
The EPR or EPRB situation is no more mysterious. There, a choice of mea-
surements, say, as or ay for a given electron, is correlated with the behavior of
az or ay for another electron because the two together are in a singlet spin state
even though widely separated. Again, the two measurement situations (for az and
ay) decohere from each other, but here, in each, there is also a correlation between
the information obtained about one spin and the information that can be obtained
about the other. This behavior, although unfortunately called "non-local" by some
authors, involves no non-locality in the ordinary sense of quantum field theory and
no possibility of signaling outside the light cone. The problem with the "local real-
ism" that Einstein would have liked is not the locality but the realism. Quantum
mechanics describes alternative decohering histories and one cannot assign "reality"
simultaneously to different alternatives because they are contradictory. Everettl°
and others7 have described this situation, not incorrectly, but in a way that has
confused some, by saying that the histories are all "equally real" (meaning only
that quantum mechanics prefers none over another except via probabilities) and by
referring to "many worlds" instead of "many histories".
We conclude that resolution of the problems of interpretation presented by
quantum mechanics is not to be accomplished by further intense scrutiny of the
subject as it applies to reproducible laboratory situations, but rather through an
examination of the origin of the universe and its subsequent history. Quantum
mechanics is best and most fundamentally understood in the context of quantum
cosmology. The founders of quantum mechanics were right in pointing out that
something external to the framework of wave function and Schrodinger equation is
needed to interpret the theory. But it is not a postulated classical world to which
quantum mechanics does not apply. Rather it is the initial condition of the universe
that, together with the action function of the elementary particles and the throws
of quantum dice since the beginning, explains the origin of quasiclassical domain(s)
within quantum theory itself.
ACKNOWLEDGMENTS
One of us, MG-M, would like to acknowledge the great value of conversations about
the meaning of quantum mechanics with Felix Villars and Richard Feynman in
1963-64 and again with Richard Feynman in 1987-88. He is also very grateful to
Valentine Telegdi for discussions during 1985-86, which persuaded him to take up
the subject again after twenty years. Both of us are indebted to Telegdi for further
interesting conversations since 1987. We would also like to thank R. Griffiths for a
useful communication and a critical reading of the manuscript and R. Penrose for
a helpful discussion.
Part of this work was carried out at various times at the Institute for Theoretical
Physics, Santa Barbara, the Aspen Center for Physics, the Santa Fe Institute, and
the Department of Applied Mathematics and Theoretical Physics, University of
Cambridge. We are grateful for the hospitality of these institutions. The work of
JBH was supported in part by NSF grant PHY85-06686 and by a John Simon
Guggenheim Fellowship. The work of MG-M was supported in part by the U.S.
Department of Energy under contract DE-AC-03-81ER40050 and by the Alfred P.
Sloan Foundation.
REFERENCES
For a subject as large as this one it would be an enormous task to cite the literature
in any historically complete way. We have attempted to cite only papers that we
feel will be directly useful to the points raised in the text. These are not always
the earliest nor are they always the latest. In particular we have not attempted to
review or to cite papers where similar problems are discussed from different points
of view.
1. Aharonov, Y., P. Bergmann, and J. Lebovitz. Phys. Rev. B134 (1964):1410.
2. Bohr, N. Atomic Physics and Human Knowledge. New York: John Wiley,
1958.
3. Caldeira, A. 0., and A. J. Leggett. Physica 121A (1983):587.
4. Coleman, S. NucL Phys. B310 (1988):643.
5. Cooper, L., and D. VanVechten. Am. J. Phys. 37 (1969):1212.
6. Daneri, A., A. Loinger, and G. M. Prosperi. Nucl. Phys. 33 (1962):297.
7. DeWitt, B. Physics Today 23(9) (1970).
8. DeWitt, B., and R. N. Graham. The Many Worlds Interpretation of Quantum
Mechanics. Princeton: Princeton University Press, 1973.
9. Dicke, R. H. Am. J. Phys. 49 (1981):925.
10. Everett, H. Rev. Mod. Phys. 29 (1957):454.
11. Farhi, E., J. Goldstone, and S. Gutmann. To be published.
12. Feynman, R. P., and J. R. Vernon. Ann. Phys. (N. Y.) 24 (1963):118.
13. Finkelstein, D. Trans. N. Y. Acad. Sci. 25 (1963):621.

14. Forster, D. Hydrodynamic Fluctuations, Broken Symmetry, and Correlation
Functions. Reading, MA: Benjamin, 1975.
15. Fukuyama, T., and M. Morikawa. Phys. Rev. D39 (1989):462.
16. Gell-Mann, M. Unpublished, 1963.
17. Gen-Mann, M. Physica Scripta T15 (1987):202.
18. Gell-Mann, M. Physics Today February (1989):50.
19. Geroch, R. Nola 18 (1984):617.
20. Giddings, S., and A. Strominger. Nucl. Phys. B307 (1988):854.
21. Graham, R. N. In The Many Worlds Interpretation of Quantum Mechanics,
ed. by B. DeWitt and R.N. Graham. Princeton: Princeton University Press,
1973.
22. Griffiths, R. J. Stat. Phys. 36 (1984):219.
23. Halliwell, J. J. "Quantum Cosmology: An Introductory Review." ITP pre-
print NSF-ITP-88-131, 1988.
24. Halliwell, J. J. ITP preprint NSF-ITP-88-132, 1988.
25. Halliwell, J. Phys. Rev. D39 (1989):2912.
26. Hartle, J. B. Am. J. Phys. 36 (1968):704.
27. Hartle, J. B., and S. W. Hawking. Phys. Rev. D28 (1983):2960
28. Hartle, J. B. Phys. Rev. D37 (1988):2818.
29. Hartle, J.-B. Phys. Rev. D38 (1988):2985.
30. Hartle, J. B. In Highlights in Gravitation and Cosmology, ed. by B.R. Iyer, A.
Kembhavi, J.V. Narlikar, C.V. Vishveshwara. Cambridge: Cambridge Univer-
sity Press, 1989.
31. Hartle, J. B. In Proceedings of the 5th Marcel Grossmann Meeting on Recent
Developments in General Relativity. Singapore: World Scientific, 1989.
32. Hartle, J. B. In Proceedings of the Osgood Hill Conference on the Conceptual
Problems of Quantum Gravity, edited by A. Ashtekar and J. Stachel. Boston:
Birkhauser, 1990.
33. Hartle, J. B. In Proceedings of the 12th International Conference on General
Relativity and Gravitation. Cambridge: Cambridge University Press, 1990.
34. Hartle, J. B. In Quantum Cosmology and Baby Universes (Proceedings of the
1989 Jerusalem Winter School in Theoretical Physics), edited by S. Coleman,
J. B. Hartle, and T. Piran. Singapore: World Scientific, 1990.
35. Hawking, S. W. Phys. Lett. B195 (1983):337.
36. Hobson, A. Concepts in Statistical Mechanics. New York: Gordon and Breach,
1971.
37. Joos, E., and H. D. Zeh. Zeit. Phys. B59 (1985):223.
38. Joos, E. Phys. Lett. A116 (1986):6.
39. Kadanoff, L., and P. Martin. Ann. Phys. (N.Y.) 24 (1963):419.
40. Keifer, C. Class. Quant. Gray. 4 (1987):1369.
41. Landau, L., and E. Lifshitz. Quantum Mechanics. London: Pergamon, 1958.
42. Lebovitz, J. Physica 140A (1986):232.
43. Lloyd, S. Private communication.
44. London, F., and E. Bauer. La theorie de l'observation en micanique quan-

tique. Paris: Hermann, 1939.
45. Mukhanov, V. F. In Proceedings of the Third Seminar on Quantum Gravity,
ed. by M. A. Markov, V. A. Berezin, and V. P. Frolov. Singapore: World Sci-
entific, 1985.
46. Omnes, R. J. Stat. Phys. 53 (1988):893.
49. Peierls, R. B. In Symposium on the Foundations of Modern Physics, ed. by P.
Lahti and P. Mittelstaedt. Singapore: World Scientific, 1985.
50. Renninger, M. Zeit. Phys. 158 (1960):417.
51. R.osenkrantz, R. D., ed. E.T. Jaynes: Papers on Probability, Statistics, and
Statistical Physics. Dordrecht: D. Reidel, 1983.
52. Unruh, W. In New Techniques and Ideas in Quantum Measurement Theory,
edited by D.M. Greenberger. Vol. 480, Ann. N.Y. Acad. Sci. New York: New
York Academy of Science, 1986.
53. Vilenkin, A. Phys. Rev. D33 (1986):3560.
54. Wheeler, J. A. Rev. Mod. Phys. 29 (1957):463.
55. Wigner, E. Am. J. Phys. 31 (1963):6.
56. Zeh, H. Found. Phys. 1 (1971):69.
57. Zeh, H. Phys. Lett. A116 (1986):9.
58. Zurek, W. II. Phys. Rev. D24 (1981):1516.
59. Zurek, W. II. Phys. Rev. D26 (1982):1862.
60. Zurek, W. H. In Non-Equilibrium Quantum Statistical Physics, edited by G.
Moore and M. Scully. New York: Plenum Press, 1984.
61. Zurek, W. H. Phys. Rev. A 40(8) (1989):4731-4751.
Jonathan J. Halliwell
Institute for Theoretical Physics, University of California, Santa Barbara, California 93106.
Bitnet address: Halliwell@SBITP
Information Dissipation in Quantum

Cosmology and the Emergence of Classical
Spacetime
We discuss the manner in which the gravitational field becomes classical in

quantum cosmology—quantum gravity applied to closed cosmologies. We
argue that there are at least two steps involved. First, the quantum state
of the gravitational field must be strongly peaked about a set of classical
configurations. Second, these configurations must have negligible interfer-
ence with each other. This second step involves decoherence—destruction
of the off-diagonal terms in the density matrix, representing interference.
This may be achieved by dissipating information about correlations into
an environment. Although the entire universe, by definition, has no envi-
ronment, it may be split up into subsystems and one or more subsystems
may be regarded as an environment for the others. In particular, mat-
ter modes may be used as an environment for the gravitational field. We
show, in a simple homogeneous isotropic model, that the density matrix
of the universe is decohered by the long wavelength modes of an inhomo-
geneous massless scalar field. We also show, using decoherence arguments,
that the WKB component of the wave function of the universe which rep-
resents expanding universes has negligible interference with the collapsing
component. This justifies the usual assumption that they may be treated
separately. We discuss the role of cosmological boundary conditions. The
Complexity, Entropy, and the Physics of Information, SF1 Studies in the

460 Jonathan J.Halliwell
fact that we observe a classical spacetime today seems to depend on them

crucially.
1. INTRODUCTION
The point of this work is to discuss some recent work on the application of a body of
ideas normally used in quantum measurement theory to quantum cosmology. The
question that I will address is the following: How, in a quantum theory of gravity
as applied to closed cosmological systems, i.e., in quantum cosmology, does the
gravitational field become classical? The possible answer to this question that I will
discuss involves decoherence of the density matrix of the universe. This necessarily
involves the dissipation of information, making contact with the information theme
of this meeting. But before proceeding to quantum cosmology, we begin by dis-
cussing the emergence of classical behavior in some more down-to-earth quantum
systems.
It is one of the undeniable facts of our experience that the world about us is
described by classical laws to a very high degree of accuracy. In classical mechanics,
a system may be assigned a quite definite state and its evolution is described in a
deterministic manner—given the state of the system at a particular time, one can
predict its state at a later time with certainty. And yet, it is believed that the world
is fundamentally quantum mechanical in nature. Phemonena on all scales up to and
including the entire universe are supposedly described by quantum mechanics. In
quantum mechanics, because superpositions of interfering states are permissable,
it is generally not possible to say that a system is in a definite state. Moreover,
evolution is not deterministic but probabilistic—given the state of the system at a
particular time, one can calculate only the probability of finding it in another state
at a later time.
If quantum theory is to be reconciled with our classical experience, it is clearly
essential to understand the sense in which, and the extent to which, quantum
mechanics reproduces the effects of classical mechanics. This is an issue that as-
sumes particular importance in the quantum theory of measurement." There, one
describes the measuring apparatus in quantum mechanical terms; yet all such ap-
parata behave in a distinctly classical manner when the experimenter's eye reads
the meter.
Early universe cosmology provides another class of situations in which the
emergence of classical behavior from quantum mechanics is a process of particu-
lar interest. In the inflationary universe scenario, for example, the classical density
fluctuations required for galaxy formation supposedly originate in the quantum
fluctuations of a scalar field, hugely amplified by inflation.1°,21 This is, in a sense,
an extreme example of a quantum measurement process, in that the large-scale
structure of the universe that we see today is a meter which has permanently
recorded the quantum state of the scalar field at early times. The manner in which
this quantum to classical transition comes about has been discussed by numerous
authors.11,12,28,38,39 A more fundamental situation of interest, and the one with
Information Dissipation in Quantum Cosmology 461
which we are primarily concerned, is quantum cosmology in which one attempts to

apply quantum mechanics to closed cosmologies. Since this involves quantizing the
gravitational field, one of the goals of this endeavor should surely be to predict the
conditions under which the gravitational field may be regarded as classical. And at
a humbler level, one can ask why everyday objects such as tables and chairs behave
classically when they are really described by quantum mechanics.
The point of view which we will take is: there are at least two requirements
that must be satisfied before a system may be regarded as classical. The first re-
quirement is that the wave function of the system, or some distribution constructed
from the wave function, should be strongly peaked about a classical configuration,
or a set of classical configurations. This requirement would be satisfied, for exam-
ple, if the wave function is a coherent state, or a superposition of coherent states.
Even though this requirement means that the quantum state may be peaked about
distinct macroscopic configurations, it does not, however, rule out the possibility
of interference between them. The second requirement, therefore, is that the inter-
ference between distinct macroscopic states is exceedingly small. This involves the
notion of decoherence—destruction of the off-diagonal terms in the density matrix,
which represent interference.
In section 2, we will make these ideas more precise in a simple example from
ordinary quantum mechanics. In the following sections, we will go on to discuss how
these ideas are applied to quantum cosmology. A much longer, but slightly different
account of this work may be found in J. J. Halliwell.15
2. THE EMERGENCE OF CLASSICAL BEHAVIOR IN QUANTUM

MECHANICS
Consider a single particle system S which starts out in a state I W(0)) and after a
time t finds itself in a superposition of well-separated coherent states:
I Ik(i)) = Ecn x.(0). (2.1)
In the configuration space representation, the coherent states I x„(t)) are given by
(x ( x„(t)) = exp(ipnx) exp X;(0)2

(a 0. (2.2)
They are Gaussian wavepackets strongly peaked about the classical trajectories
zn(t). One might therefore be tempted to say that the system has become classical,
and that the particle will be following one of the trajectories z„(t) with probability
I c„ 12 . The problem, however, is that if the wavepackets met up at some stage in
the future, then they would interfere constructively. One could not, therefore, say
that the particle is following a definite trajectory.
The problem is highlighted when one writes down the pure state density matrix
corresponding to the state (2.1). It is
Ppure(i) =1 *OD (CO 1= Ec;„cn I znomx,o) I • (2.3)

n,m
It involves non-zero, off-diagonal terms which represent interference between differ-

ent trajectories. We are seeking to maintain, however, that the system is described
by a classical ensemble, of the type encountered in statistical mechanics, in which
one finds the particle to be following the trajectory xn(t) with probability 1 41 I2_
Sucha situation could only be described by the mixed-state density matrix
Prnixed = E I e. 121 znomzn(t) I . (2.4)
This differs from Eq. (2.3) by the presence of off-diagonal terms. It is only when
these terms may be neglected that we may say that the particle is following a
definite trajectory.
There is no way that under unitary SchrOdinger evolution the pure-state den-
sity matrix (2.3) will evolve into the mixed-state density matrix (2.4). How, then,
may the interference terms be suppressed? The resolution of this apparent diffi-
culty comes from the recognition that no macroscopic system can realistically be
considered as closed and isolated from the rest of the world around it. Laboratory
measuring apparata interact with surrounding air molecules, even intergalactic gas
molecules are not isolated because they interact with the microwave background.
Let us refer to the rest of the world as "the environment," E. Then it can be argued
that it is the inescapable interaction with the environment which leads to a con-
tinuous "measuring" or "monitoring" of a macroscopic system and it is this that
causes the interference terms to become very small. This is decoherence.
Let us study this in more detail. Consider the system S considered above, but
now take into account also the states {1 En)} of the environment E. Let the initial
state of the total system SE be
1 4)(0)) =1*(0) 14) (2.5)

where 14) is the initial state of the environment. After time t, this will evolve into
a state of the form
I 4*D =En
en xn en) - (2.6)
The coherent states of the system I zn(0) thus become correlated with the envi-
ronment states I En). The point, however, is that one is not interested in the state
of the environment. This is traced out in the calculation of any quantities of inter-
est. The object of particular relevance, therefore, is the reduced or coarse-grained
density matrix, obtained by tracing over the environment states:
. 0) I .
= Tre I .1)(0)(1(t) 1 Ecne,,,(e,n I en) I zn(0)(. (2.7)
non
The density matrix I 4Xt))(4)(t) I of the total system evolves unitarily, of course. The
reduced density matrix (2.7), however, does not. It therefore holds the possibility
of evolving an initially pure state to a final mixed state. In particular, if, as can be
the case, the inner products (em I En ) are very small when n m, then Eq. (2.7)
will be indistinguishable from the mixed state density matrix (2.4).
One may now say that the environment has caused the density matrix to
decohere—it has permitted the interfering set of macroscopic configurations to re-
solve into a non-interfering ensemble of states, as used in classical statistical me-
chanics. Or to put it another way, the environment has "collapsed the wave func-
tion" of the system. Or yet another form of words, is to say that the environment
"induces a superselection rule" which forbids superpositions of distinct macroscopic
states from being observed. Note that the loss of information is an important as-
pect to the process. Classical behavior thus emerges only when information about
correlations is dissipated into the environment.
This general body of ideas has been discussed by many people, including Gell-
Mann, Hartle and Telegdi,5 Griffiths,8,9 Joos and Zeh,25 Omnes,35 Peres,37 Unruh
and Zurek,4° Wigner,49 Zeh,50 and Zurek 53,54,55,56
3. QUANTUM COSMOLOGY
We now apply the ideas introduced in the previous section to quantum cosmol-
ogy. This subject began life in the 1960's, with the seminal works of DeWitt,2
MiSner,3°'31'32.33 and Wheeler 46,47 More recently, it has been revitalized primar-
ily by Hartle and Hawking" and by Vilellkill.41'42'43'44'45 Some review articles are
those by Hartle16,17 and Halliwel1.14
The object is that one applies ideas from an as-yet incomplete quantum theory
of gravity to closed cosmological models. One imagines that the four-dimensional
space-time is sliced up into three-surfaces, and one concentrates on the variables
defined on the three-surfaces which describe the configuration of the gravitational
and matter fields. These are the three-metric hij and the matter field, which we take
to be a scalar field (1). The quantum state of the system is then represented by a wave
functional Alf[hii , 4,], a functional of the metric and scalar field configurations. For
rather fundamental reasons, the wave functional does not depend on time explicitly.
Loosely speaking, information about time is already contained in the variables his
and (I). Because it does not have an explicit time label, AY obeys not a time-dependent
Schrodinger equation, but a zero-energy equation of the form
(Hs + Hy.) cY = 0 (3.1)
where Hs and Hm are, respectively, the gravitational and matter Hamiltonian.

Suppose one solves the Wheeler-DeWitt equation subject to certain boundary

conditions. Then one finds that, in certain regions, there exist approximate WKB
solutions of the form
= exp(iS[hii])%,„[Izii , it] . (3.2)
Here S[hii] is a. rapidly varying phase and satisfies the Einstein-Hamilton-Jacobi

equation. The e's part of the wave function thus indicates that the wave function
corresponds to a set of classical solutions to the Einstein equation with Hamilton-
Jacobi function S. More precisely, one may show that Eq. (3.2) is strongly peaked
about the set of solution satisfying the first integral
6S
r — 6hii (3.3)
where wki is the momentum conjugate to hii." The wave function (3.2) is therefore
analgousilho the sum of coherent states (2.1). The wave function qi,n[h15 ,1] is a
slowly varying function of the three-metric. It describes quantum field theory for
the scalar field t on the gravitational background hii.
So the first requirement for the emergence of classical behavior is satisfied by
the solution (3.2)—the wave function is peaked about a set of classical solutions.
But what about the second requirement, decoherence? Let us apply the ideas in-
troduced in the previous sections and introduce an environment which continually
monitors the metric. One meets with an immediate difficulty. The entire universe
has no environment. It is not an open system, but a closed one: in fact, it is the only
genuinely closed system we know of. The point, however, is that one is never inter-
ested in measuring more than a small fraction of the potentially observable features
of the universe. One may therefore regard just some of the variables describing the
universe as the observed system and the rest as environment. The latter are traced
out in the density matrix. In this way, some—but certainly not all—the variables
describing the universe may become classical.
Which variables do we take to be the environment? There is, in general, no
obvious natural choice. However, here we are interested in understanding how the
gravitational field becomes classical, so it is perhaps appropriate to regard the
matter modes as an environment for the metric. With this choice, the reduced
density matrix corresponding to the wave function (3.2) is
, hii) = f 7.3044*Ki , (DPP [h=i , . (3.4)
The object is to show that this is small for ki hii. It is very difficult to offer
general arguments as to the extent to which this is the case, but one can see it
for particular models. Numerous models have been considered in the literature
Mk is actually rather difficult to construct the analogue of coherent states in quantum cosmology.
See, however, Kiefer.27
(for example Fukuyama and Morikawa,34 Halliwell,is Kiefer,26 Mellor and Moss,29
Morikawa,34 Padmanabhan,36 Zeh61,62).
For definiteness, let us briefly consider one particular model.15 Suppose we
restrict the metric to be of the Robertson-Walker type:
ds2 = _dt2 4. a2(0d123 (3.5)
where dflg is the metric on the three-sphere. Then the gravitational field is described
solely by the scale factor a. Let us take the only source to be a cosmological constant
A. One may show that the wave function for this model is of the form (3.2), and
the e's part indicates that it is peaked about classical solutions of the form
1
a(t) = — cosh(Ht) (3.6)
H
where H2 = A/3. This is de Sitter space. Most models that have been considered
in the literature use the full infinite number of modes of the scalar field as the
environment. However, this leads to certain technical complications, so here we will
do something simpler. The de Sitter solutions have a horizon size a = H-1. One
may separate the scalar field modes into long (1) or short (s) wavelength modes,
4, = 4,/ +4,, , depending on whether their wavelength is, respectively, greater or less
than the horizon size. The number of modes outside the horizon is actually finite;
moreover, they are not observable, so it seems reasonable to consider these as the
environment. With this choice, and with a particular choice for the quantum state
of the scalar field, one finds that the reduced density matrix is
r (a _ ir i
0-(a, d)) ..::-. exp [ 0.2a (3.7)
where the coherence width a is given by a = 1/H3a. It follows that A diagonalizes

for a > a, ie for Ha >> 1. This means that the interference becomes negligible
when the scale factor is much greater than the minimum size of de Sitter space.
4. THE INTERFERENCE BETWEEN EXPANDING AND

COLLAPSING UNIVERSES
One may go further with this approach. The Wheeler-DeWitt operator is real. This
means that if 'I' is a solution, then its complex conjugate is a solution also. In
particular, there is a purely real solution
T(a, 4.) = eis(04,„)(a, 4) + e-is(a)41„i(a, 4) = 4,(4.) + t(_) . (4.1)

The Hartle-Hawking "no-boundary" boundary condition proposal picks out a wave

function of this type.20 If the first term is regarded as corresponding a set of expand-
ing solution, then the second corresponds to a set of collapsing solutionsPlBecause
each of these WKB solutions correspond to very distinct macroscopic states, one
would hope to show that the interference between them is negligible. This is indeed
possible.15 Following the above approach, and again using sit as the environment,
one finds that the part of the reduced density matrix for Eq. (4.1) corresponding
to the interference between expanding and collapsing solutions is
1
I5(+-)= Try, [111.4.)(d, (//)111(_)(a, (Pi)] rt: exp [ 4-
04
al)2 • (4.2)
This differs from Eq. (3.8) in one crucial respect, namely in the sign between a
and a, in Eq. (4.2). This has the consequence that itio._) is always very small, even
when a = d. The interference between expanding and collasping components of the
wave function may therefore be neglected.
5. DEPENDENCE ON BOUNDARY CONDITIONS

We have seen that in quantum cosmology, spacetime may be become classical when
(i) the wave function is peaked about sets of classical configurations and (ii) the
interference between these configurations is destroyed through interaction with mat-
ter variables. To what extent does the emergence of classical behavior depend on
boundary or initial conditions? Boundary conditions enter in two ways. The wave
function of the system in a given region may be either exponential or oscillatory,
depending to some extent on the boundary conditions. It is only when the wave
function is oscillatory that is is peaked about a set of classical solutions; thus, the
boundary conditions determine whether or not the wave function is peaked about
sets of classical configurations.
The second way in which boundary conditions enter is through the quantum
state of the environment. The coherence width o will depend, possibly quite cru-
cially, on the quantum state of the environment. This, in turn, is determined by
the cosmological boundary conditions; thus, the boundary conditions will control
the extent to which distinct macroscopic states decohere.
These considerations suggest that the fact that the present-day universe is de-
scribed so well by classical laws is a consequence of a law of initial conditions, as has
previously been suggested by Gell-Mann, Hartle and Telegdi5 and by Hartle.15,15
MBecause there is no explicit time label, one cannot say which of the two solutions corresponds
to collapsing and which corresponds to expanding—one can only make relative statements. I am
grateful to H. D. Zeh for emphasizing this point to me.
6. DOES THE ENTIRE UNIVERSE REALLY HAVE NO

ENVIRONMENT?
In section 3, it was stated that the entire universe has no environment, and for that
reason, one has to split it into subsystems, and regard one as an environment for
the rest. This is certainly the case for conventional quantum cosmology. However,
recent developments yield new perspectives on this issue. In quantum cosmology,
one normally thinks of the spatial extent of universe as being represented by a sin-
gle, connected three-surface. However, it has recently been suggested that it may
also have a large number of small disconnected components, referred to as "baby
universes.,21,6,7,22,23,24 In a Euclidean path integral, these baby universes are con-
nected to the "parent universe" by wormholes. The picture one has, therefore, is of
a large parent universe in a dilute gas of baby universes. The original motivation
for studying this scenario is that the baby universes lead to an effective modifi-
cation of the fundamental coupling constants, possibly leading to a prediction of
their values. However, it is clear that baby universes could also be of value in con-
nection with the issue studied here, namely the emergence of classical behavior for
macroscopic systems. In particular, a possibility which naturally suggests itself is
to use the baby universes as an environment to decohere the density matrix. First
steps in this direction have been taken by Ellis, Mohanty and Nanopoulos.3 They
estimated that, although the baby universes have negligible effect for single parti-
cles, they very effectively decohere the density matrix of a macroscopic body with
Avogadro's number of particles.
ACKNOWLEDGMENTS
I would like to thank Jim Hartle, Raymond Laflamme, Seth Lloyd, Jorma Louko,
Ian Moss, Don Page and H. Dieter Zeh for useful conversations. I am particularly
grateful to Wojciech Zurek for many very enlightening discussions on decoherence.
I would also like to thank Wojciech for organizing such an interesting and successful
meeting.
REFERENCES
1. Coleman, S. Nucl. Phys. B30 (1988):643.
2. DeWitt, B. Phys. Rev. 160 (1967):1113.
3. Ellis, J., S. Mohanty and D. V. Nanopoulos. Phys.Lett. 221B (1989):113.
4. Fukuyama, T., and M. Morikawa Kyoto preprint KUNS (1988):936.
5. Gell-Mann, M., J. B. Hartle and V. Telegdi. Work in progress, 1989.
6. Giddings, S., and A. Strominger. NucL Phys. B306 (1988):890.
7. Giddings, S., and A. Strominger Nucl. Phys. B307 (1988):854.
8. Griffiths, R. J. Stat. Phys. 36 (1984):219.
9. Griffiths, R. Am. J .Phys. 55 (1987):11.
10. Guth, A. H., and S. Y. Pi , Phys. Rev. Lett. 49 (1982):1110.
11. Guth, A. H., and S. Y. Pi. Phys. Rev. D32 (1985):1899.
12. Halliwell, J. J. Phys. Lett. 185B (1987):341.
13. Halliwell, J. J. Phys. Rev. D36 (1987):3626.
14. Halliwell, J. J. Santa Barbara ITP preprint NSF-ITP-88-131, 1988. An exten-
sive list of papers on quantum cosmology may be found in J. J.Halliwell, ITP
preprint NSF-ITP-88-132, 1988.
15. Halliwell, J. J. Phys. Rev. D39 (1989):2912.
16. Hartle, J. B. In High Energy Physics proceedings of the Yale Summer School,
New Haven, Connecticut, edited by M. J.Bowick and F. Gursey. Singapore:
World Scientific, 1985.
17. Hartle, J. B. In Gravitation in Astrophysics, Proceedings of the Cargese Ad-
vanced Summer Institute, Cargese, France, 1986.
18. Hartle, J. B. Phys. Rev. D37 (1988):2818.
19. Hartle, J. B.Phys. Rev. D38 (1988):2985.
20. Hartle, J. B. and S. W. Hawking. Phys. Rev. D28 (1983):2960.
21. Hawking, S. W. Phys. Lett. 115B (1982):295.
22. Hawking, S. W. Phys. Lett: 195B (1987):337.
23. Hawking, S. W. Phys. Rev. D37 (1988):904.
24. Hawking, S. W., and R. Laflamme. Phys. Lett. 209B (1988):39.
25. Joos, E., and H. D. Zeh. Z. Phys.B, 59 (1985):223.
26. Kiefer, C. Class. Quantum Gray., 4 (1987):1369.
27. Kiefer, C. Phys. Rev. D38 1761:(1988).
28. Lyth, D. Phys. Rev. D31 (1985):1931.
29. Mellor, F., and I. G. Moss., Newcastle preprint, 1988.
30. Misner, C. W. Phys. Rev. 186 (1969):1319.
31. Misner, C. W. Phys. Rev.Lett. 22 (1969):1071.
32. Misner, C. W. In Relativity, edited by M. Carmeli, S. Fickler and L. Witten.
San Francisco: Plenum, 1970.
33. Misner, C. W. In Magic without Magic: John Archibald Wheeler, a Collection
of Essays in Honor of his 60th Birthday, edited by J. Klauder. San Francisco:
Freeman, 1972.
34. Morikawa, M. Kyoto Preprint KUNS 923, 1988.
35. Omnes, R. J. Stat. Phys. 53 (1988):893, 933, 957.

36. Padmanabhan, T. Phys. Rev. D39 (1988):2924.
37. Peres, A. Am. J. Phys. 48 (1980):931.
38. Sakagami, M. Hiroshima Preprint RRK 87-5, 1987.
39. Sasaki, M. Prog. Theor. Phys., 76 (1986):1036.
40. Unruh, W. G., and W. H. Zurek. Phys. Rev. D40 (1989):1071-1034.
41. Vilenkin, A. Phys. Lett. 117B (1982):25.
44. Vilenkin, A. Nucl. Phys. B252 (1985):141.
46. Wheeler, 3. A. In Relativity, Groups and Topology, Les Houches Lectures, edited
by C. DeWitt and B. DeWitt. New York: Gordon and Breach, 1963.
47. Wheeler, J. A. In Batelles Rencontres, edited by C. DeWitt and J. A.Wheeler.
New York: Benjamin, 1968.
48. Wheeler, J. A., and W. H. Zurek. Quantum Theory and Measurement, Prince-
ton, New Jersey: Princeton University Press, 1983.
49. Wigner, E . In Quantum Optics, Experimental Gravitation and Measurement
Theory, edited by P. Meystre and M. 0. Scully. New York: Plenum, 1982.
50. Zeh, H. D. Found. Phys. 1 (1970):69.
51. Zeh, H. D. Phys. Lett. A116 (1986):9.
52. Zeh, H. D. Phys. Lett. A126 (1988):311.
53. Zurek, W. H. Phys. Rev. D24 (1981):1516.
55. Zurek, W. H. In Frontiers of Nonequilibrium Statistical Physics, edited by G.
T. Moore and M. 0. Scully. New York: Plenum, 1986.
56. Zurek, W. H. In Proceedings of the Osgood Hill Conference on Conceptual
Problems in Quantum Gravity, edited by A. Ashtekar and J. Stachel. Boston:
Birkhauser, 1989.
David Z. Albert
Department of Philosophy, Columbia University, New York City, NY
The Quantum Mechanics of

Self-Measurement
INTRODUCTION
Let me start off by telling you a science fiction story that is essentially in the tradi-
tion of curious stories about quantum mechanics, like the story about SchrOdinger's
cat and the story about Wigner's friend. Those stories both begin with the assump-
tion that every physical system in the world (not merely subatomic particles, but
measuring instruments and tables and chairs and cats and people and oceans and
stars, too) is a quantum-mechanical system, and that all such systems evolve en-
tirely in accordance with the linear quantum-mechanical equations of motion, and
that every self-adjoint local operator of such systems can, at least in principle, be
measured. Those are the rules of the game we're going to play here; and what I
want to tell you about is a move which is possible in this game, but which hasn't
been considered before.
The old stories of SchrOdinger's cat and Wigner's friend end at a point where
(in the first case) the cat is in a superposition of states, one in which it is alive
and the other in which it is dead; or where (in the second case) the friend is in
a superposition of states that entail various mutually exclusive beliefs about the
result of some given experiment. Suppose for example, that Wigner's friend carries
out a measurement of the y-spin of a spin —1/2 particle p that is initially prepared

472 David 1 Albert
in the state [a2 = +1/2)1,. He carries out the measurement by means of a measuring
device that interacts with p and that he subsequently looks at in order to ascertain
the result of the measurement. The end of that story looks like this:
1
[a) = —
1 [[Believes that ay = -E'")Friend
,
•[Shows that ety = +VMeasuring Device • frry = +.1)p]
1,
+ [[Believes that ay = 2 Friend
•[Shows that ay = --4)Measuring Device
•[ay = —&12.1
(The phrase "Believes that ay = 1/2," of course, doesn't completely specify the
quantum state of Wigner's friend's very complicated brain. But the many other
degrees of freedom of that system—those, for example, that specify what sort of ice
cream Wigner's friend prefers, simply don't concern us here, and so, for the moment,
we'll ignore them). Now, such endings as this are usually judged to be so bizarre,
and so blatantly to contradict daily experience, as to invalidate the assumption that
gives rise to these stories. That is, these stories are usually judged to imply that
there must be physical processes in the world that cannot be described by linear
equations of motion, processes like the collapse of the wave function.
There are, on the other hand, as everybody knows, a number of ways of at-
tempting to deny this judgement; there are a number of ways of attempting to
suppose that this is genuinely the way things are at the end of a measuring process,
and that this state somehow manages to appear to Wigner's friend or to count for
Wigner's friend either as a case of believing that ay = +1/2 or a case of believing
that cry = —1/2.
One of these attempts goes back to Hugh Everett, and has come to be called
the Many Worlds interpretation of quantum mechanics. I think it's too bad that
Everett's very simple thesis (which is just that the linear quantal equations of
motion are always exactly right) has come to be called that at all, because that
name has sometimes encouraged a false impression that there are supposed to be
more physical universes around after a measurement than there were before it. It
might have been better to call what Everett came up with a "many-points-of-view"
interpretation of quantum mechanics, or something like that, because it is surely
true of Everett's picture (as it is in all other pictures of quantum theory that I
know about) that there is always exactly one physical universe. However, the rules
of Everett's game, which he insists we play to the very end, require that every one
of the physical systems of which that universe is composed—including cats and
measuring instruments and my friend's brain and my own brain—can be, and often
are, in those bizarre superpositions. The various elements of such a superposition,
in the case of brains, correspond to a variety of mutually exclusive points of view
The Quantum Mechanics of Self-Measurement 473
about the world, as it were, all of which are simultaneously associated with one and
the same physical observer.
Needless to say, in some given physical situation, different observers may be
associated with different numbers of such points of view (they may, that is, inhabit
different numbers of Everett worlds). Suppose, for example, that we add a second
friend (Friend # 2) to the little story we just told. Suppose at the end of that story,
when the state of the composite system consisting of p and of the measuring device
for y and of Friend #1 is [a), that Friend #2 measures A, where A is a maximal
observable of that composite system such that A[a) = a[a). Friend #2 carries out
that measurement by means of an A-measuring device (which, according to the
rules of the game, can always be constructed) which interacts with that composite
system and which Friend #2 subsequently looks at to ascertain the result of the
measurement. When that's all done (since the result of this measurement will with
certainty be A = a, things will look like this:
[i3) =[Believes that A = a)jend #2

•[Show that A = a)A. Measuring Device • [a)
In this state, Friend #1 inhabits two Everett worlds (the world in which ay = +1/2
and the world in which ay = —1/2, whereas Friend #2 inhabits only one (the
world in which A = a), which by itself encompassPs the entire state [0). Moreover,
in his single world, Friend #2 possesses something like a photograph of the two
worlds which Friend #1 simultaneously inhabits (he possesses, that is, a recording
in his measuring device of the fact that A = a). By means of his measurement of
A, Friend #2 directly sees the full superposition of Friend #1's brain states; and
indeed, he can even specify the relative sign between those states.
Nothing ought to be very surprising in this, and indeed, it was all very well
known to Everett and his readers. So far as Friend #2 is concerned, after all,
Friend #1, whatever else he may be, is a physical system out there in the external
world; and consequently, according to the rules of our game, Friend #1 ought to
be no less susceptible to being measured in superpositions than a single subatomic
particle. But this need not be the very end of the game. One more move, which
is fully in accordance with the rules of the game, is possible; a move that Everett
never mentions. Here it is: Suppose, at the end of the slightly longer story we just
told, when the state of things is [ ), that Friend #2 shows his photograph of the
two Everett worlds that Friend #1 simultaneously inhabits to Friend #1. Suppose,
that is, that Friend #1 now looks at the measuring apparatus for A. Well, it's quite
474 David Z. Albert
trivial to show that the result of such a move will be this:
ET ) = Heves that A = a)Fyiend #2

[ Shows that A = a)A Measuring Device
• --[B.t.A = a, B.t.cry = +-i)Friend #1
1/2-
• [S.t.ay = +12 )yM.D.
• [ay = +1)p + [B.t.A. = a, B.T.cry = Friend #2

2)
• [S.t.cry = 1)ym.D. • [cry =
This is, in a number of respects, a somewhat extraordinary state of affairs. Let's

look at it carefully. To begin with, note that we have brought here an additional
degree of freedom of Friend #1's brain explicitly into consideration (I've called
this degree of freedom the A-memory of Friend #1). This is the degree of freedom
wherein Friend #1 remembers the information in the photograph that Friend #2
has just shown him. Now, what's going on in this state is that Friend #1 still
simultaneously inhabits two different and mutually exclusive Everett worlds, one in
which ay = +1/2 and the other in which ay = —1/2; but now, in each of those two
worlds separately, Friend #1 knows that A = a; he knows, that is, that another
world exists; indeed, he has literally seen what amounts to a photograph of that
other world!
That's the basic idea, but perhaps I can make it a little clearer now by putting
it in a somewhat different language.
Let's imagine telling this story from a purely external perspective (from the
perspective of some third observer, say, who isn't involved in the action at all),
without speculating about what it might be like to be either Friend #1 or Friend
#2, as if they both were merely, say, automata. If we tell it that way, then we'll
be able to say what it is that's so interesting about this peculiar state at the end
without having to indulge in any talk about multiple worlds.
Here's how that works: Suppose that the state [y) obtains, and that the au-
tomation called Friend #1 is ordered to predict the outcome of an upcoming mea-
surement of ay , and suppose that cry measurement is carried out, and that the
automaton's prediction of that outcome is measured as well; then invariably it will
be the case that the outcomes of those two measurements (the final measurement
of ay and the measurement of the automaton's prediction about the outcome of
that measurement) will coincide. And precisely the same thing applies (when [7)
obtains) in an upcoming measurement of A as well (since, when [7) obtains, there
is an element of the memory bank of this automaton whose state is correlated to
cry, and there is also another element of the memory bank of this same automaton
whose state is, at the same time, correlated to A); and as a matter of fact, precisely
the same thing applies even to upcoming measurements of both A and ay (but note
The Quantum Mechanics of Self-Measurement 475
that in this case the order in which those two measurements are carried out will be
important).
This automaton, then, when [7) obtains, knows (in whatever sense it may be
appropriate to speak of automata knowing things), accurately and simultaneously,
the values of both A and cry, even though those two observables don't commute.
What this means (leaving aside, as I said, all of the foggy questions about what
it might be like from the perspective of the automaton, which is what originally
drove us to the talk about multiple worlds) is that this automaton, in this state,
is in a position to predict, correctly, without peeking, the outcomes of upcoming
measurements of either A or cry or both, even though A and ey are, according to
the familiar dogma about measurement theory, incompatible.
Moreover, no automaton in the world other than this one (no observer in the
world other than Friend #1, in science fiction talk) can ever, even in principle, be
in a position to simultaneously predict the outcomes of upcoming measurements
of precisely those two observables (even though they can, of course, know either
one). The possibility of Friend #1's being able to make such predictions hinges on
the fact that A is an observable of (among other things) Friend #1 himself. There
is a well-defined sense here in which this automaton, this friend, has privileged
epistemic access to itself.
Let me (by way of finishing up) try to expand on that just a little bit.
There is an otherfamous attempt to suppose that the linear quantum-mechanical
equations of motion are invariably the true equations of motion of the wave-function
of the entire physical world. This attempt goes back to Bohm, and has recently been
championed and further developed by John Bell. It's a hidden variables theory (it
is, more precisely, a completely deterministic hidden variables theory, which ex-
actly reproduces the statistical predictions of non-relativistic quantum mechanics
by means of an averaging over the various possible values of those hidden variables),
and it has the same straightforward sort of realistic interpretation as does, say, clas-
sical mechanics. It's well known that there are lots of things in this theory that one
ought to be unhappy about (I'm thinking mostly about non-locality here); but let's
concentrate, for just a moment, on the fact that such a theory is logically possible.
Since this theory makes all the same predictions as quantum mechanics does, every
one of those predictions, including the ones in our story about quantum-mechanical
automata, will necessarily arise in this theory, too.
That produces an odd situation. Remember the two automata in the story
(Friend #1 and Friend #2). Suppose that h) obtains, and suppose that things
are set up so that some future act of #1 is to be determined by the results of
upcoming measurements of ay and A. On Bohm and Bell's theory, there is, right
now, a matter of fact about what that act is going to be, and it follows from what
we discovered about the automaton #1 can correctly predict what that act is going
to be, but not so for automaton #2, nor for any other one, anywhere in the world.
So it turns out that it can arise, in a completely deterministic physical theory,
that an automaton can in principle be constructed that can ascertain certain of its
own acts in advance, even though no other automaton, and no external observer
476 David Z. Albert
whatever—supposing even that they can measure with infinite delicacy and infinite
precision—can ascertain them; and that strikes me as something of a surprise.
Perhaps it deserves to be emphasized that there are no paradoxes here, and
no violations of quantum theory from which, after all, it was all derived. We have
simply discovered a new move here, a move that entirely accords with the rules of
quantum mechanics (if the quantum-mechanical rules are all the rules there are)
whereby quantum-mechanical observers can sometimes effectively carry out certain
measurements on themselves. This move just wasn't anticipated by the founders of
quantum mechanics, and it happens that when you make a move like this, things
begin to look very odd, and the uncertainty relations cease to apply in the long
familiar ways.
ACKNOWLEDGMENT
I'm thankful to Deborah Gieringer for her technical assistance in preparing this
paper for publication.
L. A. Khalfin
International Solvay Institutes of Physics and Chemistry, Universite Libre de Bruxelles,
CP-231, Campus Plaine, Boulevard du Triomphe, B-1050 Bruxelles, Belgium; permanent
address: Steklov Mathematical Institute of the Academy of Sciences U.S.S.R., Fontanka
27, 191011 Leningrad D-11, U.S.S.R.
The Quantum-Classical Correspondence

in Light of Classical Bell's and Quantum
Tsirelson's Inequalities
We study the well-known problem of the quantum-classical correspondence,

or the problem of the reconstruction of the classical world within quantum
theory which is a basic fundamental dynamical theory. In connection with
this problem, we also study the fundamental problem of the foundation of
statistical physics.
"I do not believe in micro- and macro-laws, but only in (structural) laws
of general validity." —A. Einstein
Fast progress in experimental techniques supports more and more thorough
examinations of the applicability of quantum theory far beyond the range of phe-
nomena from which quantum theory arose. For all that, no restrictions in principle
are revealed for its applicability and none inherently of classical physical systems.
However, according to the Copenhagen interpretationP1the fundament of quantum
theory is classical ideas (the classical world) taken equally with quantum ideas
rather than being deduced from the letter. The Copenhagen interpretation stipu-
lates the joint application of two description modes, the classical and the quantum,
[1] "The Copenhagen interpretation is quantum mechanics."--R. Peirles

478 L A. Khallin
to a physical world which apparently is "connected" in that it cannot be divided

naturally (from the nature of things) into two realms, well separated with some
gap, one being covered by the quantum description and the other by the classical
one. The Copenhagen interpretation specifies an interface between the application
domains of the two description modes; being sharp, this interface is conventional
and movable. It can be shifted within the limits of a "neutral domain," where both
description modes are applicable and conform; this is just the correspondence prin-
ciple for quantum and classical theories. The "neutral domain" is supposed to be
vast enough to contain all physical phenomena immediately perceivable by human
observers and also all those which can be successfully described by classical physics
and treated as part of the classical, described, macroscopic environment.
The physical contents of the correspondence principle is obviously connected
with the roominess of the "neutral area." Is that area really so vast as indicated
above? Related problems and concepts are discussed in this report.
Recently two disciplines have come into play here, namely the theory of classical
and quantum correlations, characterized by Bell's and Tsirelson's inequalities, and
the quantum theory of irreversible processes, originated in the unstable particles
(states) quantum decay theory and in fluctuation-dissipation relations. A quantita-
tive criterion for applicability of the classical description was obtained by us (Dr. B.
S. Tsirelson is my co-worker) in a relevant approximation. It is fulfilled for macro-
scopic bodies under usual conditions; however, it is possible to design conditions to
violate it for the same bodies, allowing for macroscopic quantum phenomena.
I will investigate as follows:
1. The algebraic structure of classical Bell's inequalities.
2. The quantum Tsirelson's inequalities.
3. The quasi-classical analogs of Bell's inequalities.
4. The axiomatics of local causal behaviors.
5. The phase transition from the quantum description to the classical description.
6. The reconstruction of the classical world within quantum theory; A quantitative
criterion for the applicability of the classical description within quantum theory.
7. The problem of the foundations of statistical physics and the quantum decay
theory.
8. The macroscopic quantum effects.
In this report I give only the standing points of these problems and our
resuits1,2,3,4,5,6,7,8,9,10,11
on these problems. More detailed discussion is given in
our survey.12
Classical Bell's and Quantum Tsirelson's Inequalities 479
1. THE ALGEBRAIC STRUCTURE OF CLASSICAL BELL'S

INEQUALITIES AND
2. THE QUANTUM TSIRELSON'S INEQUALITIES
We now investigate the algebraic structure of classical Bell's and quantum Tsirel-
son's inequalities in the same manner and we will see that classical Bell's inequalities
are the simplest case from this algebraic point of view.
In the algebra of quantum observables, let us choose commuting subalgebra Ai.
Let observables A1, E A are commute [Aiaj„ Aith] = 0 if i1 i2, and in general do
not commute [A1111 A1212] 0 0 if it = i2. In the classical case A15 E Al are simple
c-numbers and commute for all il, i2. Assume (only for simplicity) that every A11
has a discrete spectrum; let ajjk be the eigen values of A15, and Pijk be the spectral
projectors. Let us fix some quantum state (either the pure state or density operator
case) where (...) is the mean value of the observable in this state. Then
air
PPP: (Pli1k1 P2l3k2 • • •) (1)
is the probability of the coincidence of such events: the measurement of the observ-
able A lit gives the result k1 , that of the observable A21, the result k2, etc.
PROBLEM
To find the general conditions on the values which can be expressed in the
form of Eq. (I).
In general this problem has not been solved up to now. In our work we can see
some of the not-simple cases. But now we will go to the simplest nontrivial case:
i = 1,2; j = 1,2; k = 1, 2. Assume for simplicity that aijk = ±1 = 4 = 1.
THEOREM. 1
An • A21 + An • A22 + Al2 • An — Al2 • A22 < 21.5 • 1

(2)
Spec{An • An + An • A22 + Al2 • A21 — Al2 ' A22} E [-2v, 2/]
PROOF It is possible to prove this result for some more general cases, but you can
see the direct and simple elegant proof:
_ =___
1 2
Ni2- (Ail + Al 2 + A21 + A22) — C
A21 + A22)2 1 A21 — A22) 2 (3)

+
=TriV" Vi
(Al2
Nri
C =A11 • A21 + All ' A22 + Al2 • A21 — Al2 A22
480 L A. Khalfin
For the classical case, in which all An, Al2) A21, A22 are commute (c-numbers), a
trivial inequality for these c-numbers follows:
A11 A21 + All • A22 + Al2 • A21 - A,2 • A22 < 2•1 (4)
The inequality (4) gives the algebraic structure of the classical Bell-CHSH inequality
for correlation functions
[(A11 • A21) + (All A22) + (A,2 • A21) - (Al2 • A22)] < 2 (5)
The inequality (2) gives the algebraic structure of the quantum Tsirelson's inequal-
ity for correlation functions
[(Au • A21) + (An • A22) + (A,2 • An) — (A,2 • A22)] 5. 21/2. (6)
The inequalities (5) and (6) are model-independent; that is, they do not depend
on physical mechanism and physical parameters, except the space-time parameters
connected with the local causality. We see the principal fundamental gap between
classical Bell's and quantum Tsirelson's inequalities, because quantum Tsirelson's
inequalities do not contain the Planck constant. It is interesting to point out that
Tsirelson's quantum inequalities for the general case are the same as for simplest
spin 1/2 case.
The class of correlation functions ((...)), or rather of "behaviors" in the sense
of section 4, allowed by quantum Tsirelson's inequalities is essentially smaller than
that allowed by general probabilistic local causality (see section 4):
[(An • A21) + (Au • A22) + (A l2 • A21) — (A,2 • A22)] 5_ 4 (7)
where (All • A21) aef P11, - • • •

In this sense the quantum Tsirelson's inequalities are essentially nontrivial.
Therefore their possible violation can be revealed in principle by an experiment. In
this case the conception of a local quantum theory would be rejected within the
same generality, just as a violation of the Bell's inequalities rejects the conception of
a local classical theory. Possible and necessary experiments with K°
B° — B° and with their analogs in future high-energy areas were discussed in the
author's previous work.13,12
3. THE QUASI-CLASSICAL ANALOGS OF BELL'S

INEQUALITIES
It is natural to believe that a violation of the classical Bell's inequalities by quan-
tum objects must be small in quasi-classical situations. In this connection we want
A
22
2_
a
0 _21 _
n acr
FIGURE 1 Quasi-classical analogs of Bell's inequalities; phase transition to classical

Bell's inequalities.
to obtain inequalities which, holding true for quantum objects, approximate the
classical Bell's inequalities in quasi-classical situations. Such inequalities, which of
course are model dependent, were derived in Khalfin and Tsirelson8 and we called
these inequalities the quasi-classical analogs of the classical Bell's inequalities. One
example of these inequalities is:
h2
[(A11 • A21) + (A11 • A22) + (Al2 • An) - (Al2 • A22)] < 2 + c— (8)
cr
where c is the absolute constant and a is the (model-dependent) parameter of

the quasi-classical approximation (cr-i 0 corresponds to the classical limit; see
Figure 1).
4. THE AXIOMATICS OF LOCAL CAUSAL BEHAVIORS

In our previous work, we derived the axiomatics of local causal behaviors, based on
the general axioms of the probability theory and the conception of local causality
(the relativistic probability theory). The full classification of all possible behaviors
includes8
a. The stochastic behavior,
b. The deterministic behavior, and
c. The hidden deterministic behavior.
482 L A. Khalfin
The general stochastic behavior gives us the general inequality (7). The hidden
deterministic behavior gives us the classical Bell's inequalities. It is interesting that
the so-called dynamical chaos is also the hidden deterministic behavior. All classical
stochastic phenomena of the probability theory are hidden deterministic behaviors.
And only the quantum behavior gives us the "real" stochastic phenomena.
5. THE PHASE TRANSITION FROM THE QUANTUM

DESCRIPTION TO THE CLASSICAL DESCRIPTION
In our 1985 work,8 we obtained the quasi-classical analogs of Bell's inequalities and
some estimates like Eq. (8). But in 198710 we proved the existence of finite era.,
and for cr-1 < (7;1 the quasi-classical analogs of Bell's inequalities break down to
exactly classical Bell's inequalities. For cr-1 > er..
-1 we have the quantum descrip-
tion, the quantum Tsirelson's inequalities, and the possibility of the macroscopic
quantum effects. The critical value (IV is of course model dependent (see section 6).
The existence of finite o.;1 corresponds to the phase transition from the quantum
description to the exactly classical (but not quasi-classical) description.
6. THE RECONSTRUCTION OF THE CLASSICAL WORLD

WITHIN QUANTUM THEORY; A QUANTITATIVE CRITERION
FOR THE APPLICABILITY OF THE CLASSICAL DESCRIPTION
WITHIN QUANTUM THEORY
We investigated the problem of the reconstruction of the classical world within
quantum theory by using, as in many others' works,14,18,16,17,18,19,20,21,22,23 the
quantum dynamical description for our physical objects (systems), and the statisti-
cal physical (thermodynamical) description of the environment. Of course, it will be
possible without the logical circle, if we investigate the problem of the foundation
of the statistical physics also within quantum theory (see section 7). In all previous
work,14,15,16,17,18,19,20,21,22,23 this problem was not investigated.
In these papers the loss of quantum coherence was investigated as the loss of
the quantum interference and such consideration was necessarily model dependent.
Moreover in these papers the authors estimated the rate at which quantum coher-
ence is lost and proved only the exponential decreasing of this coherence, not the
exact breakdown of it.
In our work we investigated the same problem with model-independent tests—
the violation of Bell's and Tsirelson's. inequalities. Such consideration was true for
any of the physical observables. Moreover, in our 1987 work,10,12 we proved not
the exponential decreasing of the quantum effects, but the disappearance of all
quantum effects.
In Zurek22 the speed of delocalization of the quantum correlations was esti-

mated by
A = A-2 • kB • T • ls(sm)-2(sec)-1 (9)
where kB is the Boltzmann constant, T is the temperature, and 1" is the friction
coefficient. The parts of the wave packet, which are divided by interval Az, lose the
coherence in such a time interval
h2
A-1 • (Ax)-2 = (10)
kB • T • r • (ax)2 ( )1
In our work10 we obtained another estimate on the basis of Bell's and Tsirelson's
inequalities:
)2
~kB hm•T •T
independent of any information on Az.
In the same work, we defined the very essential new concept—the time of clas-
sical factorization (rj), that is, the time of loss of quantum effects or quantum
description (for a more exact definition see our previous work10). For some con-
crete examplesl° we obtained quantitative estimates of the corresponding rj. For
example, for the macroscopic object with the length = 1cm, the mass ce lgr., in-
side the gas with the density .,:•_•• 10-26kg/m3, the temperature •zt. 1K° and heat
electromagnetic radiation with the same temperature as in our previous work10 we
obtained this estimate:
rj r.r. 106sec. (12)
The estimate (12) gives us the essentially macroscopic time (see section 8).
In our new work12 for the estimation of the time of classical factorization rj we
use a combination of the method in our previous work10 and some ideas from Diosi's
work.15,16 Now I give one model example. Let q,, q2 be the coordinates (collective
degrees of freedom) of two objects with mass m1 , m2 respectively, and pi,p2 the
corresponding impulses. Let us investigate the quantum dynamics of the motion of
these two objects with such a Hamiltonian:
, 1 2 1 2 1, 2 1 , 2
11(q1,P1; q2, P2) = • P2 + -t2 • q2 k12 • gl - 42 (13)
2m1 2m2 2 2
and let us add to this interaction the fluctuation forces, which for simplicity we
will assume are not correlated for objects 1 and 2. The equations of motion, in the
language of the stochastic differential equations are:
1
dpi = - • o • - ki2 • • + Ai • dbi, dqi = — • pidt
1
(14)
dp2 = - k2 • q2 • dt - k12 • qi • di+ A2. db2, dq2 = — • p2 dt
m2
484 L A. Khalfin
where b1(0,62(t) are noncorrelated Wiener processes, the derivatives of which are
white noise processes:
(i'1,2(s) • i 1 ,2(0) = cr(s — t) (15)
and Al, A2 are the intensity of the fluctuating forces. For A1, A2, by using the
fluctuation-dissipation conditions, the following expression is derived:
A?,2 = 2 • ri,2 • kB • T . (16)

In our current work12 we obtained such necessary and sufficient conditions for reduc-
tion to the classical description (the breakdown of quantum Tsirelson's inequalities
to classical Bell's inequalities):
1k121 < A, • A2 • (17)
It is wonderful, but the condition (17) does not depend on kl , k2. For a more simple
case of identical objects we have, from Eq. (17):
1 2 2r
< —A2 = -r
h
•k • T =
h B rtherm
(18)
clef
'therm =
kBT
For a more general form of interaction with potential energy U(qi, 0), the condition
(18) is
02U(qi , q2 )1 < 2r
(19)
I NI .8q2 I - rtherm
which defines the corresponding •rf .. So for times t > Di we see classical (without
any quantum interference) dynamics.
8. THE PROBLEM OF THE FOUNDATION OF STATISTICAL

PHYSICS; THE QUANTUM DECAY THEORY
The irreversible decay of unstable particles within reversible quantum theory was
the key for the solution of the problem of the foundation of statistical physics within
quantum theory.
The irreversible phenomenological decay equation of Rutherfor-Soddy
= .1.z(t), A = const(t) > 0, (20)

dt
which was derived before the origin of quantum mechanics, looks like a typical
irreversible kinetic equation of statistical physics.
The problem of the foundation of the irreversible phenomenological decay the-

ory is very analogical to the problem of the foundation of statistical physics within
the reversible quantum theory. So, the problem is: Is it possible to derive the irre-
versible phenomenological equation (20) within the reversible quantum theory? If
yes, then we must give the method of evaluation of A from the underlying quantum
theory. If no, then we must understand why the phenomenological equation (20)
is usually fulfilled with very high accuracy, and we must suggest some problems
for which the predictions of the phenomenological theory and the exact underlying
quantum theory are quite different.
Let us now investigate the general Cauchy problem of quantum theory for closed
(the full Hamiltonian) conservative physical system:
OIT( t)
H 1111(0) = ih , H = const(t)
et
o) = = 0)) , P 011 I 0) = 1
Hicok) = E k), (9 kis t) = bk
H ISoE) = EI ipE),(9 E iio E') = O(E —
o ) = Eckisok)+" c(E)1,0E > dE

k Spec H
Ck = (Wkit 0) c(E) = (9,E11110) (21)
Eicki2+J IC(E)12 • dE = 1
Spec
141(t)) = exp lit) Kro
=Eck •e*E“ • 19k)-F c(E) • e-tEt - 19E) • dE

k Spec H
(sok i*(2)) = c- Ekt (9 E ls (2 )) = c(E) • e-tEt
From the condition (41014/0) = 1 follows that there must exist (independent from
H) some self-adjoint operator H0 , for which Ro) is the eigen vector of the discrete
spectrum of Ho
HoRo) = EgRo) (22)
If we choose different initial vector states Ro), then H0 also will be different.
The initial vector state 00) defined as additional to and independent of H the
information on the "preparation" or the origin of the investigated physical system.
From H and Ho we can define the interaction part of the Hamiltonian Hint =
H — Ho.
Let us define now the decay amplitude p(t):
P(i) = (410R(t)) = exP(-11-1/1 )1W0) (23)

486 L A. Khalfin
From Eq. (21) we have for p(t) this expression:
E141 2 . e_ kEkt ic(E)12 . e_,Et . dE (24)

s pec H
p(t) =
The decay amplitude p(t) is the characteristic function of the probability theory
point of view.
DEFINITION The solution 141(1)) (which was defined by operator H and initial vector
I*0), or operator Ho; see Eq. (22)) we call irreversible if
IP(t)I t o (25)
THEOREM. 5 For irreversibility of the solution l'(0) it is necessary and sufficient:

1. H must have absolutely continuous spectrum (c(E) # 0).
2. The contribution of the possible discrete spectrum of H in the initial vector
state Ro) must be zero (ck Er: 0, Vk).
If some ck 0 we have the quantum version of Poincare recurrence theorem.
Nontrivial in the proof of this theorem is the necessarity, which was based on
the famous S. N. Bernstein theorem from the probability theory. From Fock-Krylov
theorem follows the spontaneous breakdown of t-invariance for some solutions of
the t-invariant (reversible) quantum theory if Hamiltonian H has the absolute con-
tinuous spectrum.
The essential point of the quantum decay theory plays the spectral principle—
the existence of the vacuum state:
Spec.H > 0 (26)
From Eq. (26) and Paley-Wiener theorem follows the necessity8,9
1(12idti<00
lo1 (27)
Directly from Eq. (27) it follows nonexponentiality (nonmarkovian) of the decay

law. The additional nonexponential term as was proved in my previous work8,9 is
the analytic function of Ret > 0 half-plane and from this reason cannot be zero in
any interval of time t.
The usual energy distribution w(E) = ic(E)2
w(E) = 4(E) • [(E - E0)2 -1-1'2] -1 (28)

where t(E) is the continuous "preparation function" follow for t > (h/Eo) such
decomposition:
2
P(t) •-"= exp —
h
it') - r • 40) •r2) 1
° (V ) • (29)
h 7 (Eli + t
The exponential (main for t of order 0/0-1) does not depend on the "preparation
function," and nonexponential term for I' << Eo for this times is very small. If
t oo, r 0, rt = const all nonexponential terms disappear.
Now we investigate the problem of the foundation of statistical physics by
using methods which are analogical to methods of the quantum decay theory. From
section 4 it follows that we must investigate this problem within quantum theory.
First of all, we must understand that the problems of statistical physics are only
a special kind of general problems of the quantum theory, and must be defined
by some additional structure. These problems are defined by full Hamiltonian H,
initial vector state 111f0) (or by Ho), and by additional self-adjoint operator A:
8t(i)) , H = const (t)

HIT (t)) = ih 8141
(30)
Igo) = = 0)), (*ol'Ito) = 1, = Egito)
A14) = ak14)
The full information in statistical physics is in the set of probabilities {Pk (t)}, Vk,
where
Pk(t) = IPk(i)12 , Pk(t) = (tki41(t)) (31)
It is very essential to point out that the full Hamiltonian H includes all quantum
fields which define interactions, but A defines the finite number N particles in the
finite box (see Figure 2) for these particles, but not for the fields! For this reason,
the full Hamiltonian H, which includes the quantum fields in infinite space, has an
absolute continuous spectrum, which gives us the dynamics (spontaneous) origin of
irreversibility.
Now we can define usual entropy of statistical physics:
S(t) = -E Pk(i) log Pk(t)

(32)
0 <Pk(t) < 1, Epk(t) =
488 L A. Khalfin
FIGURE 2 The structure of statphysical prob-

lems within quantum theory.
for which we must obtain for some special condition the proof of the Second Law
(Boltzmann H-Theorem). Usual von Neumann entropy is the dynamical invariant
for general problems of quantum theory and have no direct correspondence to the
entropy (32) for the problems of statistical physics (it is evident because for von
Neumann entropy the Second Law is not true—this entropy is the dynamical in-
variant of quantum theory).
From Eqs. (30) and (31) it is easy to see that
A(t) =(t (01AIF

°3 e —*E1+ k r c(E) . (v) . 09E) • dE . dE,
=jo o (33)
=Eak • Pk (t )
If A = B + G,[B,H]= 0,[G,1-1]0 0, B # 0, and GO 0, then
(yoE, lAicE) = b(E') • 45(E' — E) + g(E' ,E) . (34)

In this case we have
A(t) = b(E) • ca(E) • dB

roo .
+ 10° e— 4E' Eit •• c(E) • c (E') • g(E', E) • dB • dE,
0 0
co
(oo) = J b(E) • ca(E) • de
00
Pk (t) = Jo f3k(E) • w (E) • dE
(35)
+ro re-t
0
Et+t Eit • c(E) c* (r) • -Th(E' , • dB • dB,
.
Eak • fik (E) = b(E)
E ak • -rk(E' , E) = 9(E',
Pk(00) = 10 13k(E) • LV(E) • dE
For usual c(E) [E — E0 + iri-1 we obtain
r h cos ( Eot)
P(t) =P k(oo) + 7k e — t + a0 • e-i
22
(Eg +r) h
—3 (36)
r2 h2
+ bo
(Ea + r2)4 t2 °
and for r < .E0
A(co) = b(E) • w(E) • dB = b(Eo — ir) = b(Eo) . (37)

J
From Eqs. (36) and (37) follows the main statement4,5,6,9,11,12:
The usual axiomatic statistical physics cannot be the exact theory: the
ergodicity and the intermixing which are not true, the nonexponential de--
creasing of the correlation functions, the equilibrium distribution which
depends on r (the relaxation time). But for usual cases (r << Eo) the
axiomatic statistical physics is a very good approximation; however, the
accuracy of this approximation is not homogeneous for all problems of the
statistical physics.
490 L A. Khalfin
S (t)
t2 t3
FIGURE 3 The time dependence of entropy.
BOLTZMANN H-THEOREM.11 If
Pk(t)= Pk(oo) + 4(0

(14(t)
dt = —f(t) • (kW, f(t) ?_ 0 (38)
EG(t) . log Pk (00) > 0
k
then
dS( t)
a. SO ) < S(co), b. > 0, Vt[0,00). (39)
dt
From this theorem it is possible for some conditions of H, IWO, and A to prove the
Second Law for some big, but finite interval of time t E [11,t2] (see Figure 3). But
it also can be proved that for finite small interval of time t E [t2, t3], in which two
nonexponential terms in Eq. (36) have the same order; for general initial conditions
(IV), the Second Law will not be true (see Figure 3). It gives us the first dynamical
mechanism for not-special conditions for the origin of the order from chaos. This
interval of time t E [12, ts] is the interval of very big times for usual physical systems
(God created life (order) on the last day).
8. THE MACROSCOPIC QUANTUM EFFECTS

As follows from section 5,1149,23 there exists the time of classical factorization Tj.
For usual macroscopic bodies in usual conditions (T is not very small), rf is very
small. It is the reason why for usual macroscopic bodies in usual conditions we do
not see quantum effects and the description of these bodies for usual times t > rf ,
as was proved, is exactly classical. But for the same macroscopic bodies for another
set of not-so-usual conditions (T is sufficiently small and other conditions), Tj will
not be so small, but macroscopic (as, for example, is shown in section 5, Eq.(12)
where r1 106 sec.). Then for usual macroscopic times t > rj, we can see in
the dynamics of these macroscopic bodies the macroscopic quantum effects, the
quantum correlation, and coherence in the motion. Of course the conditions for
such macroscopic rf are not very simple, but we hope that in the near future these
conditions will be possible for modern experiments. These macroscopic quantum
effects can change all of our classical point of view on the macroscopic world, and
may give us the possibility to understand the biological phenomena, which as typical
quantum phenomena can be characterized by the main property of nondivisibility.
ACKNOWLEDGMENTS
As indicated before, the work reviewed here was done in collaboration with Dr. B. S.
Tsirelson. I am indebted to him for interesting co-work and interesting discussions.
My big thanks to the Santa Fe Institute, especially to Prof. J. A. Wheeler and
Dr. W. H. Zurek for the invitation to the workshop "Complexity, Entropy and
the Physics of Information" (Santa Fe, New Mexico, May 29—June 2, 1989). My
big thanks also to the participants of this workshop for interesting discussions. The
final version of this report was prepared at the Santa Fe Institute. My big thanks to
Dr. George A. Cowan, President of the Santa Fe Institute for the warm hospitality
and the pleasant conditions for scientific work. I thank Prof. T. Toffoli and Dr.
W. H. Zurek for improvement of the English version of this report.
492 L A. Khan
REFERENCES
1. Caldeira, A. 0., and A. G. Leggett. Phys. Rev. 31A (1985):1059.
2. Cirelgon, B. S. (a.k.a. B. S. Tsirelson). Lett. Math. Phys. 4 (1980):93.
3. Diosi, L. Phys. Lett. A122 (1987):221.
4. Diosi, L. Phys. Lett. A129 (1988):419.
5. Fock, V. A., and N. S. Krylov. JETPH 17 (1947):93.
6. Joos, E., and H. D. Zeh. Zeitschr. Phys. Ser. B 59 (1985):223.
7. Joos, E. In "New Techniques and Ideas in Quantum Measurement Theory."
Ann. N.Y. Acad. Sci. 480 (1986):6.
8. Khalfin, L. A. DAN USSR 115 (1957):277.
9. Khalfin, L. A. JETPH 33 (1958):1371.
10. Khalfin, L. A. DAN USSR 162 (1965):1273.
11. Khalfin, L. A. Theor. & Math. Phys. 35 (1978):425.
12. Khalfin, L. A. Uspekhy Mathematicheskikh Nauk 33 (1978):243.
13. Khalfin, L. A. Phys. Lett. 112B (1982):223.
14. Khalfin, L. A. "Bell's Inequalities, Tsirelson Inequalities and K° - Te, D° -
o 0 o
D , B -B Mesons." Report on the scientific session of the Nuclear Division
of the Academy of Sciences USSR., April 1983; unpublished.
15. Khalfin, L. A., and B. S. Tsirelson. "Quantum and Quasi-Classical Analogs
of Bell's Inequalities." In Proceedings of the Symposium on the Foundations of
Modern Physics, 1985, edited by P. Lahti et al. New York: World Scientific,
1985, 441.
16. Khalfin, L. A. "The Problem of the Foundation of the Statistical Physics, the
Nonexponentiality of the Asymptotic of the Correlation Functions and the
Quantum Theory of Decay." In Abstracts of the First World Congress Be-
noulli Society, 1986, edited by Yu. V. Prokhorov, Vol. II. Nauka, 1986, 692.
17. Khalfin, L. A. and B. S. Tsirelson. "A Quantitative Criterion for the Applica-
bility of the Classical Description within the Quantum Theory." In Proceed-
ings of the Symposium on the Foundations of Modern Physics, 1987, edited
by P. Lahti et al. New York: World Scientific, 1987, 369.
18. Khalfin, L. A. "The Problem of the Foundation of Statistical Physics and
the Quantum Decay Theory." Paper presented at the Stefan Banach Inter-
national Mathematical Center, September 1988, Warsaw, Poland; to be pub-
lished.
19. Khalfin, L. A., and B. S. Tsirelson. "Quantum-Classical Correspondence in
Light of Bell's Inequalities." To be published.
20. Unruh, W. G., and W. H. Zurek. Phys. Rev. D40 (1989):1071.
21. Wootters, W. K., and W. H. Zurek. Phys. Rev. D19 (1979):473.
22. Zurek, W. H. In "New Techniques and Ideas in Quantum Measurement The-
ory." Ann. N.Y. Acad. Sci. 480 (1986):89.
23. Zurek, W. H. In Frontiers of Nonequilibrium Statistical Physics, edited by G.

T. Moore et al. New York: Plenum, 1986.
24. Zurek, W. H. Preprint LA-UR-89-225, Los Alamos National Laboratory,
1989.
Roland Om nes
Laboratoire de Physique Theorique et Hautes Energies, Universite de Paris-Sud, Bat. 211,
91406 Orsay Cedex, FRANCE
Some Progress in Measurement Theory:

The Logical Interpretation of Quantum
Mechanics
A few technical advances in the foundations of quantum mechanics, in-

cluding environment-induced superselection rules, some recent results in
semi-classical physics, and Griffiths' consistent histories can be linked to-
gether by using a common logical framework to provide a new formulation
for the interpretation of the theory which can be favorably compared with
the Copenhagen interpretation.
INTRODUCTION
Several significant technical advances concerning the interpretation of quantum
mechanics have been made more or less recently, mostly during the last decade.
I refer particularly to the discovery and study of environment-induced superselec-
tion rules,1,2,3,8,25,26 some new general results in semi-classical physics,4,9,16 the
distinction to be made between a macroscopic system and a classically behaving
one,1233 the possibility to describe a consistent history of a quantum systems as
well as a description of a quantum system by ordinary Boolean logic.14 It turns
out that all of them can now be joined together to provide a completely new inter-
pretation of quantum mechanics to be called here the logical interpretation. This
name is not coined to mean that the progress made along the lines of logic is more
Complexity, Entropy, and the Physics of Information, SF1 Studies in

496 Roland Omnes
important than any other advance but to stress the unifying role of logic when
bringing them together into a consistent theory. The logical interpretation stands
upon many fewer axioms than the Copenhagen interpretation and, in fact, upon
just a unique universal axiom, and it is not plagued by unprecisely defined words or
notions. Its practical consequences, however, coincide mostly with what comes out
of the Copenhagen interpretation, except for the removal of some of its disturbing
paradoxical features.
There is no consensus as to what must be considered the most basic difficulties
of conventional quantum mechanics. One may use, however, the hindsight provided
by recent advances to identify them with two basic problems, having to do re-
spectively with the status of common sense and the status of empirical facts in
quantum mechanics. The first problem comes out of the huge logical gap separat-
ing the mathematical framework of the theory (with its Hilbert space and so on)
from the ordinary direct physical intuition one has of ordinary physical objects. As
will be seen, this is a real problem boiling down to the relation existing between
physical reality and its description by mathematics and logic; one will have to make
this correspondence clear by stating explicitly how it must be formulated.
The second problem comes from the intrinsically probabilistic character of
quantum mechanics: Remembering that a theoretical probability can only be
checked experimentally by performing a series of trials and noticing that this proce-
dure makes sense only if the result of each individual trial is by itself an undoubtable
fact, one sees that quantum mechanics, as an intrinsically probabilistic theory, must,
however, provide a room for the certainty of the data shown off by a measuring de-
vice, i.e., for facts. The solution of this dilemma will involve a proof of the validity
of some semi-classical determinism within the framework of quantum mechanics.
A complete interpretation will be obtained by solving these two problems. The
general strategy will, however, strongly differ from the Copenhagen approach: Clas-
sically behaving objects, giving rise to observable facts obeying well determinism
and allowing their common sense description by usual logic, will be interpreted by
quantum mechanics and not the other way around. This direct interpretation of
what is observed by the most fundamental form of the theory is not only what
should be expected from science but it also turns out to be both straightforward
and fruitful.
GENERAL AXIOMS
The following basic axioms of quantum mechanics will be taken for granted:
Axiom 1 associates a Hilbert space H and an algebra of operators with an
individual isolated physical system S or, more properly, with any theoretical model
of this system.
Axiom 2 defines dynamics by the Schr8dinger equation, using a hamiltonian H.
The corresponding evolution operator will be written as U(t) = exp(-2riHtlh).
Some Progress in Measurement Theory 497
Axiom 3 is technical: The Hilbert space describing two non-interacting systems

is the tensor product of their Hilbert spaces and the total hamiltonian is the sum
of their hamiltonians.
VON NEUMANN PREDICATES

A first step from these very abstract notions toward a more intuitive description
of the properties of a system is obtained by using the elementary propositions, or
predicates, that were introduced by Von Neumann.22
First, one should agree about the vocabulary: a self-adjoint operator A will
be called as usual an observable (whatever that means in practice) and any real
number belonging to the spectrum of A will be called a value of that observable.
Von Neumann considered propositions telling, for instance, that "the position X
of a particle is in a volume V." Here the particle is associated with the Hilbert space,
X is a well-defined observable, and V a part of its spectrum, so that everything is
well defined in the proposition except the little word "is" or, what amounts to the
same, the whole predicate itself. Von Neumann proposed to associate a projector
E= 1x x 1 clx ,
with the predicate to give it a meaning in Hilbert space grammar. More generally,
to any set C in the spectrum of an observable A, one can associate a predicate [A, C]
meaning "A is in C" and a well-defined projector E. The time-indexed predicate
stating that the value of A is in C at time t can be associated with the projector
E(t) = U-1 (t)EU(t) by taking into account the Schradinger equation. Conversely,
any projector can be used to define a predicate as can be shown by taking A = E
and C = {1} in the spectrum of the projector E. One can now define states:
Axiom .4 assumes that the initial state of the system at time zero can be de-
scribed by a predicate E0 . This kind of description can be shown to represent
correctly a preparation process once the theory is complete. A state operator p will
be defined as the quotient EolTrEo. For instance p = E0 gr 0 >< 'yo 1 in the case
of a pure state. We shall also freely use, when necessary, the concept of a density
matrix.
HISTORIES
As introduced by Griffiths,6 a history of a quantum system S can be considered as
a series of conceptual snapshots describing some possible properties of the system
498 Roland Omnes
at different times. It will be found later on that a history becomes a true motion
picture in the classical limit when the system is macroscopic.
More precisely, let us choose a few ordered times 0 < ti < • • • < tn, some
observables A1 , • • - , An which are not assumed to commute and some range of values
•• • , Cn for each of these observables. A story [Ai, • • • , An , Cl, • • • ,Cnyt1) • • • )4]
is a proposition telling us that at each time (i5 ) (j = 1, • • •n), Al has its value in
the range Cj.
Griffiths proposed to assign a probability to such a story. We shall write it in
the form
w = Tr(En(tn )- • • Ei(ti)pEl(ii)• • • En(in)) • (1)
Griffiths used a slightly different expression and he relied upon the Copenhagen
interpretation to justify it. Here Eq. (1) will be postulated with no further justifi-
cation, except to notice that it is "mathematically natural" when using Feynman
path summations because a projector Ei(tj) is associated with a window through
which the paths must go at time tj. It should be stressed that w is just for the time
being a mathematical measure associated with the story, having not yet any em-
pirical meaning that could be found by a series of measurements. Quite explicitly,
we don't assume that we know right now what a measurement is.
Griffiths noticed that some restrictions must be imposed upon the projectors
entering Eq.(1) in order to satisfy the basic axioms of probability theory and par-
ticularly the additivity property of the measures for two disjoint sets. To show what
that means, it will be enough to consider the simplest case where time takes only
two values t1 and t2, denoting by E1 (E2 respectively) the projector associated with
a set Cl (C2 respectively) and by El = I — El the orthogonal projector. In that
case, it can be proved that all the axioms of probability calculus are satisfied by
definition in Eq. (1) if the following consistency condition holds:
Tr ([Ei(ii), E2(12)) = 0 • (2)
One knows how to write down similar necessary and sufficient conditions in the
general case. The essential point is that they are completely explicit.
LOGICAL STRUCTURE
Griffiths' histories will now be used to describe logically a system in both a rigorous
and an intuitive way.
First recall properly, what logicians call a logic or, more property, an interpre-
tation of formal logic consists of the following: one defines a field of propositions
(a,b,- • •) together with four operations or relations among them, giving a meaning
to "a or b," "a and b," "not a" and "a implies b," this last relation being denoted
by a =• b or "if a, then b." This is enough to do logic rigorously if some twenty or
so abstract rules are obeyed by "and, or, not, if...then." This kind of logic is also
called boolean.
Probability calculus is intimately linked with logic. One can make it clear by
choosing, for instance, two times t1 and t2 and two observables Al and A2. The
spectrum ai of Al will be divided into several regions {Cia} and similarly for 0'2.
An elementary rectangle Cia x C20 in the direct product al x cr2 will be considered
as representing a Griffiths' history or what a probabilist would call an elementary
event. A union of such sets is what a probabilist calls an event and here it will be
called a proposition describing some possible properties of the system.
As usual in set theory, the logical operators "and, or, not" will be associated
with an intersection, a union, or the complementation of sets, so that these three
logical rules and the field of propositions or events are well defined.
When a proposition a is associated with a union of two sets a1, a2, each one
representing a story, its probability will be defined by
w(a) = w(ai) + w(a2) (3)

and so on. When the consistency conditions are satisfied, these probabilities are
uniquely defined and one can define as usual the conditional probability for a propo-
sition b, given some proposition a by
w(a and b)
w(b I a) = (4)
w(a)
Then we shall define "implication" by saying that proposition a implies proposition

b(a b) if w(b I a) = 1. It can be proved that all the axioms of boolean logic are
satisfied by these conventions, as long as the consistency conditions are valid.
We shall also introduce a very important notion here: we shall say that a implies
b up to an error e if w(b I a) > 1— 6. This kind of error in logic is unavoidable when
macroscopic objects are concerned. When saying, for instance, that the Earth is a
satellite of the Sun, one must always take into account a small probability c for the
Earth to leave the Sun and go revolving around Sirius by tunnel effect according
to quantum mechanics.
You will notice that, even after making sure that the consistency conditions
are valid, there remain as many descriptions of a system or as many logics as there
are choices of the times t1, the observables Ai, and the different ranges for their
values. This multiplicity of consistent logics is nothing but an explicit expression of
the complementarity principle.
The calculations that one can perform with these kinds of logic are more or less
straightforward and we shall only mention here one remarkable theorem, albeit a
rather simple one: Let us assume that two different logics L1 and L2 both contain
the same two propositions a and b in their fields of propositions. If a b in L1 ,
then a = b in L2 . This theorem means that no contradiction can ever occur so
that the construction can never meet a paradox, in so far as a paradox is a logical
conflict.
500 Roland Omnes
One can now introduce a unique and universal rule for the interpretation of
quantum mechanics, stating how to describe the properties of a physical system in
ordinary terms and how to reason about these properties:
Axiom 5: Any description of the properties of a system should be framed into
propositions belonging to a consistent logic. Any reasoning concerning them should
be the result of an implication or a chain of implications.
From there on, when the word "imply" will be used, it will be in the sense of
this axiom. The logical construction allows us to give a clear-cut meaning to all the
reasonings an experimentalist is bound to make about his apparatuses. In practice,
it provides us with an explicit calculus of propositions selecting automatically the
propositions making sense and giving the proof of correct reasonings. Two examples
will show how this works.
In an interference two-beams experiment, it is possible to introduce the ele-
mentary predicates stating that, at some convenient time /2, a particle is in some
region of space where the two beams are recombined. All the predicates correspond-
ing to different regions describe the possible outcomes of the experiment, although
one does not know yet how to describe a counting device registering them. They
constitute a consistent logic. It is also possible to define a projector expressing that
the particle followed the upper beam but, lo and behold, there is no consistent logic
containing this predicate together with the previous predicates describing the out-
comes of the experiment. This means that logic precedes measurement. There is no
need to invoke an actual measurement to discard as meaningless: the proposition
stating that the particle followed the upper beam. Logic is enough to dispose of
it according to the universal rule of interpretation, because there is no consistent
logic allowing such a statement.
More positively, one may also consider a particle coming out of an isotropic
S-state with a somewhat well-defined velocity. This property can be described by
an initial projector E0. Another projector E2 corresponds to the predicate stating
that the particle has its position within a very small volume 5V2 around a point
z2 at time t2. Then, one can explicitly choose a time ti < /2, construct a volume
V1 that has its center on the way from the source to 22 and is big enough, and
prove the logical implication: "The particle is in 5V2 at time 22 the particle is in
V1 at time ti ." So, one can prove in this logical framework that the particle went
essentially along a straight trajectory. Similar results hold for the momentum at
time ti . To speak of position and momentum at the same time is also possible, as
will be seen later on, but with some restrictions.
Simple as they are, these two examples show that the universal rule of inter-
pretation is able to select meaningful propositions from meaningless ones and also
to provide a rational basis for some common sense statements which had to be
discarded by the Copenhagen interpretation.
CLASSICAL LIMIT
What we have called the universal rule of interpretation makes little avail of what
Bohr could have also called a universal rule of interpretation; namely the prop-
erties of a macroscopic device are described by classical physics. In fact, what he
really needed from classical physics was not so much classical dynamics as classical
logic where a property can be held to be either true or false, with no probabilistic
fuzziness.
Bohr's assumption is not as clear-cut as it once seemed since Leggett has shown
that some macroscopic systems consisting of a superconducting ring that has a
Josephson weak link can be in a quantum state.12,13 As a consequence, nobody
seems to be quite sure anymore what the Copenhagen interpretation really states
in this case.
The way out of this puzzle will be found by showing why and when classical
physics, i.e., classical dynamics together with classical logic, holds true as a conse-
quence of the universal interpretative rule. This is, of course, a drastic change of
viewpoint as compared with the familiar course of physics since it means that one
will try to prove why and when common sense can be applied rather than taking
it for granted as a gift of God. In that sense, it is also a scathing attack against
philosophical prejudice.
To begin with, one must make explicit what is a proposition in classical physics.
One may consider, for instance, giving the position and the momentum of a system
within some specified bounds. Such a statement is naturally associated with a cell
C in classical phase space (in that case a rectangular cell). Since motion will deform
such a cell, it looks reasonable to associate a classical predicate with a more or less
arbitrary cell in phase space. It will also be given a meaning as a quantum predicate
if one is able to associate a well-defined projector E(C) in Hilbert space with the
classical cell C in phase space.
If one remembers that, in semi-classical approximations, each quantum state
counts for a cell with volume h", n being the number of degrees of freedom, two
conditions should obviously be asked from the cell C:
1. It must be big enough, i.e., its phase space volume be much larger than itn.
2. It should be bulky enough and with a smooth enough boundary to be well tiled
by elementary regular cells.
This last condition can be made quite precise and, when both conditions are
met and the cell is simply connected, i.e., in one piece with no hole, we shall say
that the cell is regular.
Now there is a theorem stating that an approximate projector E(C) can be
associated with such a regular cell.1°,15 To be precise, one can define it in terms
of coherent (gaussian) states gqp with average values (q,p) for their position and
momentum, putting
E(C) = I gqp >< gqp I dq dp . (5)

J
502 Roland Omnes
It is easily found that the trace of E(C) is the semi-classical average number
N(= volume of C/h") of quantum states in C. In fact, E(C) is not exactly a
projector, but one can prove that
N-1(C)Tr I E2(C) — E(C) I = 0 ((h/LP)1/2) , (6)
where L and P are typical dimensions of C along configuration space and momen-
tum space directions. The kind of bound on the trace of an absolute value operator
as met in Eq. (6) is exactly what is needed to obtain classical logic from quan-
tum logics. Using E(C) or a true projector near enough to it, one is therefore able
to state a classical property as a quantum predicate. This kind of theorem relies
heavily upon microlocal analysis" and, as such, it is non-trivial.
One may extend this kind of kinematical properties to dynamical properties
by giving a quantum logical meaning to the classical history of a system. To do
so, given the hamiltonian H, one must first find out the Hamilton function h(q,
associated with it. The answer is given by what is called in microlocal analysis the
Weyl-symbol of the operator H and, in more familiar terms, the relation between
H and h(q, p) is exactly the one occurring between a density matrix p and the
associated Wigner distribution function23,24f.(q, p).
Once the Hamilton function h(q,p) is thus defined, one can write down the
classical Hamilton equations and discover the cell C1 which is the transform of
an initial regular cell Co by classical motion during a time interval t. Of particular
interest is the case when Cl is also regular and one will then say that the hamiltonian
(or the motion) is regular for the cell Co during the time interval t. It will be seen
that regular systems are essentially deterministic, hence their great interest.
Since Co and C1 are both regular, one can associate with them two approximate
projectors El) and El as given by Eq.(5), satisfying condition (6). If E0 were treated
like a state operator, it would evolve according to quantum dynamics to become
after a time II the operator
Eo(t) = U(0E011'4(0 . (7)
Another useful theorem, coming from previous results inaugurated by Hepp,9

which were further developed by Ginibre and Velo4 and Hagedorn,7 is the following
one15: For a regular system, one has
Ni(Co)Tr I E0(t)—El f = 0(e) . (8)
Here c is a small number depending upon Co, CI and t, expressing both the
effect of classical motion and wave packet expansion. In a nutshell, this theorem
tells us that quantum dynamics logically coincides with classical dynamics, up to
an error of order c, at least when regular systems are considered.
This theorem can be used to prove several results concerning the classical be-
havior of a regular system. Considering several times 0 < t 1 < • - < t„, and an
initial regular cell Co becoming, successively via classical motion, the regular cells
C1, • • • , one can use the projectors associated with these cells and their comple-
ments to build up several quantum propositions. One can then use Eq. (8) to prove
that the quantum logic containing all these predicates is consistent. Furthermore,
if one denotes by [C5 , t5 ] the proposition stating that the system is in the cell C5
at time ti [as characterized by the value 1 for the projector E(C5 one can prove
the implications
[C5,ti ] [Ck, tk] (9)
whatever the couple (j, k) in the set (1, • • • , n). This implication is valid up to an
error e, c being controlled by the characteristic of the cells and the time tr, as
explained above.
Eq. (9) has far-reaching consequences. It tells us that classical logic, when
expressing the consequences of classical dynamics for a regular system and regular
cells, is valid. Of course, it is only valid up to a possible errors as shown by the
example of the Earth leaving the Sun or of a car getting out of a parking lot by
tunnel effect. This kind of probability is essentially the meaning of the number e
and its value is specific for each special case to be considered.
Furthermore, the implications in Eq. (9) entail that the properties of a regular
system show up, at least approximately, determinism (since the situation at some
time t j implies the situation at a later time t k ). Such a system can also keep a
record or a memory (since the situation at a time ti implies the situation at an
earlier time ti). It will be convenient to call potential fact such a chain of mutually
implying classical propositions. This name is used because determinism and record-
ing are essential characteristics of facts, but one should not, however, forget that
at the present stage the theory is still only just talk-talk-talk with no supporting
experiments, hence the term "potential" meaning an imaginary possibility.
Since Hagedorn has shown that wave packet spreading is mainly controlled by
quantities known from classical dynamics,7 the property of regularity can be in prin-
ciple checked completely within classical dynamics. An obvious counter-example of
a system not behaving regularly is provided by a superconducting quantum inter-
ference device in a quantum situation described by Leggett12,13 and investigated
by several experimentalists.18,19,20,21 Another example is given by a K-flow after
a time t large enough to allow a strong distortion of cells by mixing and we shall
come back to it later on.
EFFECTIVE SUPER SELECTION RULES

The dynamical properties consisting of environment-induced superselection rules
are well known. I shall, however, recall them briefly for the sake of completeness:
consider, for instance, a real pendulum, i.e., a ball hanging on a wire. The position
of the ball's center of mass can be characterized by an angle 0. This angle is a
typical example of a collective coordinate. The other coordinates describing the
504 Roland Omnes
atoms and the electrons in the ball and the wire are the microscopic coordinates.
Their number N is very large and they are collectively called the environment.
One may start from an initial situation where e is given and the velocity is
zero. More properly, this can be achieved by a gaussian state 0 > realizing these
conditions on the average. It may be convenient to assume that the ball and the
wire are initially at zero temperature so that the environment is in its ground state
I 0 >. So, the complete description of this initial state is given by
10)=10>010> (10)
Naively, one would say that the motion of the pendulum will generate defor-
mations of the wire and therefore elastic waves or phonons leading to dissipation.
If one compares two clearly different initial situations 101) and 102), the amount
of dissipation in each case after the same time interval will be different so that the
corresponding states of the environment will become practically orthogonal as soon
as dissipation takes place.
Consider now the initial state
1 '10 = al I el) + 022 1 02) ,
and the density operator p = Ilf ) OF I. The collective density matrix pc , describing
only the collective coordinate, will be defined as the partial trace of r over the
environment. Putting I IP >= al 101 > +a2 102 > , which is a state of the
collective degrees of freedom only, one finds easily that
MO = (ci 101 > +az 1 02 >)(ai < el I +(a; < 02 I). (11)
On the other hand, the orthogonality of environmental states noted previously
gives, once some dissipation has taken place,
Pc(i) = 1 ai 121 el ><0'i I + I a2 121 (Y2 >< in 1 , (12)
the state 101 > being related to the initial state 101 > in a way exhibiting motion
and damping which need not interest us here. The essential point is the diagonal
form of Mt) showing the disappearance of phase relations between the two states or
what is called an effective superselection rule 25'26 It shows that the corresponding
potential facts are well separated (distinct) and the theory of measurement that
follows will also show them to be exclusive.
As is well known, these naive arguments can be replaced by serious
proofs1,2,3,8,25,26 upon which we shall not elaborate, except for a significant remark.
The objection has been raised that effective superselection rules do not provide
a final proof of fact separation for two different reasons:25,26
1. When the collective system is an harmonic oscillator and the environment con-
sists of a bath of harmonic oscillators linearly coupled to it, one can prove
the existence of Poincare recurrences, meaning that the macroscopic pendulum

may return quite closely to its initial non-separated situation after a long time.
2. One might use a powerful measuring apparatus to detect a microscopic state
of the environment which is described by p(1), and not Mt), and shows off
a non-diagonal matrix element of p(t), so that the diagonal form in Eq. (12)
would only be apparent and somewhat irrelevant for the rigorous foundations
of quantum mechanics.
These objections are very serious and cannot be discarded in the Copenhagen
approach where classical logic is taken as something absolute, but one can easily
get rid of them when taking into account the status of classical logic as being an
approximation in the logical interpretation. One can easily compute the probability
for the occurrence of a Poincare recurrence or for a supermeasuring device to give
a non-trivial result (i.e., to measure something). These probabilities turn out to be
much smaller than the limit of the validity of classical logic itself and, as such, it
makes no sense to take them into account.
TRUTH, FALSEHOOD, AND PERPLEXITY

There is one thing that this theory does not explain: how a unique result of an
individual measurement is selected among all the possible outcomes. This common
sense uniqueness of facts is far from trivial when no foreign information is introduced
into the theory: why there should be a unique fact is far from obvious and my own
opinion on the subject is still wavering.
I shall surely discard hidden classical variables as providing what philosophers
would call an efficient cause because it would be a pity to reintroduce absolute
classical logic at a submicroscopic level when we have been able to get rid of its
main troubles at the macroscopic level.
Rather than spending too much time with useless words, I shall only call your
attention to a few interesting properties of the present construction. To begin with,
the logical structure of quantum mechanics, as defined by the probability in Eq. (1)
for an history, is not a time-reversal invariant. This is the main difference between
our choice of a probability and the one used by Griffiths. On the other hand, the
theorems of measurement theory to be given in next section depend upon this
choice.
Using the property of recordable facts, we can imagine a simple world where
all past facts could be recorded. In that case, one can split time, for any given
time t, into two parts: a past where all facts are uniquely given and a future where
all facts must remain potential and all their possible outcomes are allowed. Let us
stress that this statement is not a commonplace triviality expressing what we see
around us but a mathematical property of the logical construction or, once again, a
proof of common sense. So, the theory is not able to provide a cause for the unique
506 Roland Om nes
occurrence of a fact, but it is able to make place for this uniqueness. Maybe there
is no cause after all and the theory just describes what really is.
To get to these deep (or slippery) questions, one can follow Heisenberg's conven-
tion by calling true an actual fact (i.e., a unique recorded past fact as opposed to a
potential one). However, one may go further by relying upon the non-contradiction
theorem mentioned in Section 4 and consider a statement as reliable when it is the
logical consequence of a fact. For instance, when I see as a fact the track of a particle
in a bubble chamber, I can assert reliably that it came essentially along a straight
line before being detected. This is a simple instance where the somewhat formal
present theory is nearer to common sense than the Copenhagen interpretation.
MEASUREMENT THEORY
Measurement theory now becomes a mere exercise.14 To be specific, we shall only
consider here the measurement of an observable A belonging to a physical system
Q when the eigenvalues{a4 of A are non-degenerate and discrete, and the mea-
surement is of the so-called first kind, preserving this eigenvalue. There is no special
difficulty in treating more general cases.
A measuring apparatus M will be used to measure the observable A. It will
be convenient to consider a collective variable B of M as the measurement data.
One can adapt the theory of facts to the case where there is friction and damping.
This allows us to consider as data the final position of a dial on a counter or its
digital recording. In that case, the observable B can only take, after an irreversible
interaction with the environment lasting a time 5, some values bo, b1, • • • , bn , • • •
which are the experimental data. Initially, B has the neutral value bo. It should
be stressed that the measuring device is treated here by quantum mechanics but,
nevertheless and consistently, data is treated like facts.
It will be assumed that Q and M are initially non-interacting and, because of
some wave-packet overlapping, they begin to interact at time to and do not interact
any more after time t1 = to + 6 where M has registered data.
M will be assumed to be a perfect measuring apparatus of the first kind for
the observable A. This property can be made explicit by introducing the evolution
operator S = U(to,ti) for the Q M system: it will be assumed that S an >
(10 bor > M (i.e., the effect of the interaction upon the initial state ( a. > and a
state of M characterized by the neutral initial marking 1)0 and degeneracy indices
r) is only a linear superposition of some states I ay > Q I b„, , r' > M , where
q = m = a This semi-diagonality of the S-matrix is the only ingredient that one
needs to completely define a measurement.
Now the logical game consists of introducing many predicates together with
their associated projectors: some of them describe the history of Q before mea-
surement, some others the history of Q after measurement, a predicate states the
initial value bo, other predicates mention possible final data bn , and finally some
predicates enunciate the possible values of A at time to and at time ti. One also
introduces the negation of these predicates, to obtain a field of propositions for the
measurement process altogether forming a logic L.
The first question is to decide whether or not this logic L is consistent. To
respond, it is convenient to introduce two logics L1 and L2 referring only to the
measured system Q: L1 tells stories of Q before measurement and assumes A = an
(or not) at time to. L2 begins by an initial statement Eo =I an >< an I at time t1
and tells stories of Q after measurement.
One can then prove that L is consistent if and only if L1 and L2 are respectively
consistent.
The occurrence of the initial predicate Eo in L2 is obviously wave-packet re-
duction. Its precise meaning is the following one: one can describe the story of Q
after measurement once it again becomes an isolated system, but the data B = bn
forces us to take the initial preparation predicate Eo. The basic nature of wave
packet reduction turns out to be what logicians call in their own language a modus
ponens: you use, for instance, a modus ponens when you apply a theorem while
forgetting how you proved it, discarding the corresponding implications. Similarly,
one can discard the past history of Q and the whole history of M, taking only into
account the data B = b,, when telling the story of Q after measurement.
One can do this consistently, but it is necessary to use E0 as the initial predicate.
Notice that one might have chosen in mathematics to remember all the proofs of all
theorems and in physics to follow the story of every apparatus and every particle
that came to interact with Q at one time or another. In that sense, wave packet
reduction is not really essential: it is only a very convenient result. Note however
that, were we not to use it, we would have to define the initial state at time t = —oo
and maybe introduce the whole universe in our description. So, in that sense, wave
packet reduction is really very useful.
Knowing that the overall logic L is consistent, one can try to prove some of its
implications. The most interesting one is the following:
[B = b,., t1) [A = (13)
or, in words, the result A = an of the measurement is a logical consequence of the
data B = b,,. The nature of this relation between data and result was left in the
shadows by the Copenhagen interpretation, leading to difficulties such as the EPR
paradox.
Another theorem tells us that, under some trivial restrictions, one can perform
once again a measurement of A after the first measurement giving the result an,
the second result will also be an ("repetitivity").
Finally, one can try to compute the probability for the predicate [B = 67441]
describing the experimental data. Because of the semi-diagonality of the S-matrix,
this probability turns out to depend only upon the properties of the Q-system and
not at all upon the irrelevant degeneracy indices r which represent the model of the
apparatus, its type, its color, or its age. This probability is simply given by
=< an I U(ti)Pe-1(ti) I an >, (14)
508 Roland Omnes
i.e., Born's value for the probability of the result A = an. Using Axiom 3, one
can now consider a series of independent experimental trials, give, as undoubtable
fact, meaning to the result of each trial, and therefore, give an empirical meaning
to probabilities as representing the frequency of a given physical result. The final
link between the theory and empirical physics is then contained in a last axiom
expressing Born's interpretation of the wave function, i.e., Axiom 6. The theoretical
probability of an experimental result is equal to its empirical frequency.
So, finally, one has recovered the main results of the Copenhagen interpretation
without several of its limitations and its paradoxes. The exact evaluation of these
results as providing perhaps an escape from the difficulties of quantum mechanics
will presumably need some time and much discussion and it would be premature
to assert it by now. However, it seems rather clear that the resulting interpretation
is objective.
ON INFORMATION AND ENTROPY

To conclude, I shall now consider two questions more akin to the topics of the
present workshop.
The first one has to do with K-flows. More precisely, let us consider a macro-
scopic system whose collective variables, in their classical version, behave like a
K-flow. Configuration space can be assumed to be bounded and, by restricting the
energy, phase space can also be considered to be bounded. We shall assume that
the Kohnogorov entropy increases with time like expa t /6).
Because of the mixing properties of K-flows, most regular cells will necessarily
become irregular by classical motion after a time t >> b. The kind of distorted
cells one obtains cannot be associated with a quantum projector so that classical
logic which describes classical dynamics is not anymore valid upon a time interval
somewhat larger than b! One can still define quantum consistent with Griffiths'
histories, but they refer to so contrived observables that they are of little interest.
One can, however, proceed via a statistical description of successive measure-
ments. Let us divide phase space into a large but finite number of fixed macroscopic
regular cells C5 with projectors E5. With projectors Ej, we can assume that a se-
ries of non-destructive measurements allows us to test in which cell the system is
at times 0, At, 2At, • • • where At = b. If the initial state is described by a density
matrix p, the statistical results of such measurements on independent copies of the
system will yield the probabilities at time zero:
wi = Tr(pEi). (15)
The same results would follow from the effective density matrix
Ej
(16)
Pef f (°)= E wi Tr E, •
One can then follow the successive measurements by using NJ f (0), letting it
evolve by U(t) during a time interval At where the cells remain regular, compute
wi(At), and reconstruct pen (At) from them by using Eq. (16). The errors can be
estimated and they increase only linearly in time. The following results can then
be obtained at the rigorous level of theoretical physics in contrast to mathematical
physics.17
1. The entropy
Sef f = —kTr(pe f f log pe f f) (17)
increases with time.

2. When time tends to infinity, all the evj 's tend to a common limit (equiproba-
bility or microcanonical equilibrium).
3. If a slowly variable observable Si is defined as having a Weyl-symbol to(t,p)
slowly varying upon a typical cell, then the average value of SI as obtained
from pei f (t) differs little from the average obtained from p(t)=U(t)pU...i(t).
To know whether the entropy Se f f is objective or not is not a solved problem,
but this possibility seems quite reasonable. In any case, this kind of approach seems
to open an interesting new line of investigation concerning quantum K-systems.
Obviously, the Hamilton equations do not make sense rigorously. However, one
may try to define a classical distribution by
Az,o=EwiX1(2,P)
where xi p) is the characteristic function of the domain Cj. The same procedure,
using the classical equations of motion, leads to a Markov process for the new wj's
(identical with the old ones for t = 0). Then one can show that the classical averages
for a slowly varying dynamical variable coincide with the quantum averages, except
for small linearly increasing errors. So, classical physics is in fact retrieved but only
in a statistical sense.
THE EPR EXPERIMENT

I consider, with suspicion, the view according to which quantum mechanics is a
part of information theory because, at least in the opinion of a non-specialist, a
precise field of logical propositions should be available and well defined before one
can build a theory of information about it. This means that information theory,
however useful it may be, comes in principle second after interpretation has been
well founded. In that sense, it has been shown above that the information about a
physical result obtained from some measurement data proceeds via a strict logical
implication, with a negligible loss of information when the measurement is perfect.
510 Roland Om nes
The EPR experiment is interesting from this point of views for two reasons,
first because it has led to some puzzling considerations about the transfer of
information.5 Furthermore, the non-contradiction theorem makes the logical in-
terpretation highly falsifiable since any unsolvable paradox should kill it and it is
interesting to submit it to this awesome test.
Let us, therefore, consider the EPR experiment, for instance, in the old but
clear version where one has just two position operators X1 and X2 and conjugate
momenta P1 and P2. Defining the two commuting operators X = X1 — X2 and
P = P1 + P2 one considers the wave function
< x, p I lir >= 6(x — a)5(p) (18)
and one performs a precise simultaneous measurement of the two commuting ob-
servables X1 and P2. Let us assume that these measurements yield two data D1
and .D2 as read on the corresponding measuring devices.
One can still play the logical game of measurement theory to investigate the
consistency of the process and find out its logical consequences.16 One easily proves,
for instance, the intuitively obvious result.
"Di and .D2" "Xi = xi, and P2 = P2 ) "
However, the troublesome and questionable implication standing at the root of the
EPR paradox
"D1 and D2" = "X1 = x1 and P1 = —P2?"
just don't work because there is no consistent logic according to which it could
make sense. So, if one accepts the universal rule of interpretation, there is no hint
of a paradox and, furthermore, there can be no superluminal transfer of information
since there is no logic in which such an information might be consistently formu-
lated. Remembering that information theory is based upon probability theory, one
seems to have been all along fighting about propositions for which no consistent
probability exists.
The dissolution of the EPR paradox in the logical approach looks very simple
and one may wonder wether this simplicity is not in some sense as puzzling as the
old paradox itself.
ACKNOWLEDGMENTS
Laboratoire de Physique Theorique et Haustes Energies is associated with Labora-
toire mock au CNRS.
REFERENCES
1. Caldeira, A. 0., and A. J. Leggett. "Quantum Tunneling in a Dissipative
System." Ann. Phys. 149 (1983):374.
2. Caldeira, A. 0., and A. J. Leggett. "Quantum Tunnelling in a Dissipative
System (erratum)." Ann. Phys. 153 (1983):44.
3. Feynman, R. P., and F. L. Vernon. "The Theory of a General Quantum Sys-
tem Interacting with a Linear Dissipative System." Ann. Phys. 24
(1963):118.
4. Ginibre, J., and G. Velo. "The Classical Limit of Scattering Theory for Non-
Relativistic Many Boson Systems." Comm. Math. Phys. 66 (1979):37.
5. Glauber, R. J. "Amplifiers, Attenuators and SchrOdinger's Cat." Ann. N.Y.
Acad. Sc. 480 (1986):336.
6. Griffiths, R. J. "Consistent Histories and the Interpretation of Quantum Me-
chanics." J. Stat. Phys. 36 (1984):219.
7. Hagedorn, G. "Semi-Classical Quantum Mechanics." Ann. Phys. 135
(1981):58.
8. Hepp, K., and E. H. Lieb. "Phase Transitions in Reservoir-Driven Open Sys-
tems with Applications to Lasers and Superconductors." He/v. Phys. Ada 46
(1973):573.
9. Hepp, K. "The Classical Limit for Quantum Mechanical Correlation Func-
tions." Comm. Math. Phys. 35 (1974):265.
10. H8rmander, L. "On the Asymptotic Distribution of the Eigenvalues of
Pseudo-Differential Operators." Arkiv for Mat. 17 (1979):297.
11. Hormander, L. The Analysis of Differential Operators, 4 volumes. Berlin:
Springer, 1985.
12. Leggett, A. J. Progr. Theor. Phys. 69 (Suppl) (1980):10.
13. Leggett, A. J. "Quantum Tunneling in the Presence of an Arbitrary Linear
Dissipation Mechanism." Phys. Rev. B30 (1984):1208.
14. Omnes, R. "Logical Reformulation of Quantum Mechanics." J. Stat. Phys.
53, (1988):893, 933, 957.
15. Omnes, R. "Projectors in Semi-Classical Physics." J. Stat. Phys. 57(1/2)
(1989).
16. Omnes, R. "The Einstein-Podolsky-Rosen Problem: A New Solution." Phys.
Lett. A 138 (1989):31.
17. Omnes, R. "From Hilbert Space to Common Sense: The Logical Interpreta-
tion of Quantum Mechanics." Unpublished.
18. Prance, H., T. D. Clark, J. E. Mutton, H. Prance, T. D. Spiller, R. J. Prance
et al. "Localization of Pair Charge States in a SQUID." Phys. Lett. 115A
(1986):125.
19. Prance, R. J., J. E. Mutton, H. Prance, T. D. Clark, A. Widom, and G.
Megaloudis. "First Direct Observation of the Quantum Behaviour of a Truly
Macroscopic Object." Hely. Phys. Ada 56 (1983):789.
20. Prance, R. J., et al. Phys. Lett. 107A (1985):133.
512 Roland Omnes
21. Tesche, C. D. "SchrOdinger's Cat: A Realization in Superconducting De-

vices." Ann. N.Y. Acad. Sc. 480 (1986):36.
22. Von Neumann, J. Mathematische Grundlagen der Quantenmechanik.
Berlin: Springer, 1932.
23. aligner, E. P. "On the Quantum Correction for Thermodynamic Equilib-
rium." Phys. Rev. 40 (1939):149
24. Weyl, H. "Ramifications, Old and New, of the Eigenvalue Problem." Bull.
Amer. Math. Soc. 56 (1950):115.
26. Zurek, W. H. "Environment-Induced Superselection Rules." Phys. Rev. D26
(1982):1862.
Indices
Index
A B
a priori probabilities, 129, 133 baby universes, 467
abnormal fluctuations, 321, 323, 325 baker's transformation, 263
absolute algorithmic information, 100, band-merging cascades, 223
102 basin of attraction, 160
absorber theory of radiation, 384 Bayesian inference, 234
accessible information, 32 Bayesian probability theory, 92, 387, 392
action at a distance, 384 Bekenstein number, 6-7, 16
adaptation, 139 Bekenstein-Hawking information, 67
adaptive evolution, 151, 185 Bell's inequalities, 41, 376-377, 391, 411,
adaptive landscape, 161 478, 480-482, 484
adaptive systems, 263, 295 Bernoulli flow, 224
Aharonov-Bohm effect, 5, 11 Bernoulli process, 246
Aharonov-Bohm experiment, 11 Bernoulli shift, 263
algorithmic complexity, 76, 118, 130, 152, Bernoulli-Turing machine, 225
193, 226, 228, 321-323, 375 BGS entropy, 359-360, 362, 364
algorithmic compressibility, 63-64 individual, 361
algorithmic entropy, 17, 76, 141, 144, 411 bifurcations, 237
algorithmic independence, 80 big bang, 61-62, 417
algorithmic information, 93, 141, 199, 378 big crunch, 417
absolute, 97, 100 bistability, 292
prior, 100, 107, 112 bit
algorithmic information content, 73, 76 needed vs. available, 14, 16
algorithmic information theory, 96-97, see "it from bit", 3
127, 129-131, 133 black holes, 6, 47-51, 53, 408, 418
algorithmic prior information, 100, 103, entropy of, 6, 47-48, 67
107, 112 Boltzmann-Gibbs-Shannon entropy, 75,
algorithmic randomness, 74-76, 208, 228 359-362, 364
amplification, 10 Boolean function, 158, 174
irreversible act of, 15 canalizing, 166, 169
Anderson localization, 321, 327 Boolean networks, 151, 155, 158-159, 161,
anthropic principle, 9, 63 166
anthropic reasoning, 63 autonomous, 158, 160
approximate probabilities, 428 on/off idealization, 156-157
arithmetic, 65-66 random, 162-163, 174
arrow of time, 62, 405, 407-408, 412, 416, selective adaptation in, 175
418-419, 439 Born's interpretation of the wave func-
quantum mechanical, 416 tion, 508
thermodynamical, 408 boundary, 4, 9
Aspect experiment, 368 boundary conditions, 127-128
asynchronous computation, 279 branch dependence, 450-451
attractors, 160, 290, 292 branches, 440
available information, 129
516 Index
communication channel, 29
C
communication theory, 92, 94
canonical ensemble, 100, 362 complementarity, 4, 11-12, 17
canton, 326 complex adaptive systems, 453
Cannot, 202 complex dynamics, 291, 299
Cannot efficiency formula, 83 complex hierarchical structure, 299
Casimir effect, 392, 401 complex macromolecules, 293
Casimir/Unruh-correlations, 415 complex systems, 152
causality, 396 complexity, 61, 117, 137, 199, 209, 223,
relativistic, 349 263, 299, 420 .
cell differentiation, 165 algorithmic, 226, 228
cellular automata, 142, 262, 279-280, 290, and computational performance,
297, 314, 377 209
1-D unidirectional, 297 Chaitin-Kolmogorov, 228
deterministic dynamics of; 297 conditional, 227
universal, 279 graph, 234
Chaitin-Kolmogorov complexity, 228 latent, 237
channel capacity, 32 physical, 226
chaos, 63, 209, 223, 490 regular language, 235
chaotic systems, 63 complexity catastrophe, 177
Chomsky's computational hierarchy, 229 complexity-entropy diagram, 263
Chomsky's grammar, 223 composite system, 45
Chomsky's hierarchy, 232, 250, 254-255, compressibility, 130, 132-133
265 compressibility of information, 83
Church-Tarski-Turing thesis, 81 computability in physical law, 65
Church-Turing thesis, 65, 73, 81, 225, 229 computable universe, 66
classical Bell's inequalities, 477, 479-484 computation, 223
classical ideal gas, 351 computational complexity, 208
classical limit, 498, 501 computational ecosystems, 208-209
classical logic, 503 computational ergodic theory, 264
classical spacetime, 459, 461, 466 computational limits, 67
classicity, 448 computational time/space complexity,
Clausius principle, 362 141
clock spins, 277 computational universality, 139
coarse graining, 411 computational velocity, 283
coarse-grained density matrix, 447 computer
coarse-grained sets of histories, 442 and consciousness, 15
coarse-graining, 463 evolution of structure, 16
code, 30 conditional algorithmic information, 98,
code length, 118-124 100, 106
coding, 30, 32, 36, 74, 95, 120 average, 101
coding theory, 78, 83, 85 conditional complexity, 227
coevolution, 151, 185 conditional entropy, 233
coguizability, 66 conditional machine table, 294
coherence, 291 conditional statistical information, 96
coherent states, 461, 501 average, 97
in quantum cosmology, 464 conditional switching dynamics, 294
collapse of the wave function, 405-407, consciousness, 5, 15
413-416 context-free Lindenmayer systems, 223
collective behavior, 302 context-sensitive languages, 257
communication, 15-16 continuum, 378
Index 517
Copenhagen interpretation, 386-387, 385, dynamical systems (cont'd.)

428, 477-478, 495-496, 498, 500-501, 506- intermittent, 325
508 dynamics
Copernican principle, 5, 13-14 complex, 299
core of the program, 133 conditional switching, 294
correlated fitness landscapes, 174 stochastic, 297
correlated photons, 367
correlation entropy, 364
correlation functions, 224 E
correlation information, 33 economy of information, 45
correlations, 39, 351, 361 effective density matrix, 439
EPR, 349 effective superselection rule, 504
correspondence principle, 478 Einstein completion, 367
cosmic censorship, 418 Einstein's box, 349
cosmic complexity, 61 Einstein's geometrodynamics, 8, 10-11, 17
cosmological initial condition, 410, 419 Einstein-Podolsky-Rosen correlation, 17,
cost, 193, 452 55, 349, 352, 367-368, 376, 455, 507, 510
coupling constants, 63 electronic beam, 5
emergence, 419
emergent properties, 301
D
enantiomers, 200
de Broglie wavelength, 289 ensemble, 358
de Sitter space, 465 ensemble entropy, 358-359
decidability, 255 entropy
decoherence, 376, 414-415, 417, 442, 459, algorithmic, 17, 141, 144, 441
461-462, 464- Boltzmann-Gibbs-Shannon, 356,
maximal, 12 359-362, 364
decoherence functional, 434 concept of, 410
decohering histories, 434 conditional, 233
degrees of freedom, 349 correlation, 364
fictitious, 349 ensemble, 358-359
density matrix, 348, 353 Gibbs, 129
density of states, 56 increase in quantum measure-
depth, 61-62 ments, 362
logical, 142, 229 Kolmogorov, 411, 508
thermodynamic, 142, 193, 196 Komogorov-Sinai, 352
detailed-balance relation, 54 maximum, 124, 234, 389-390
determinism, 224, 406, 409, 417-418 measurement, 358-359, 365
deterministic chaotic system, 320 metric, 225, 236
deterministic finite automata, 229 of a mixture, 347
discrete dynamics, 305 of a quantum ensemble, 349
discrete finite automata, 232 of a quantum state, 347
discrimination, 335 of statistical physics, 487
disorder physical, 73, 406
physical, 75 topological, 235
dispersion relation, 396 total excess, 250
distinguishability, 12, 17, 36 von Neumann, 352, 488
diversity, 208 von Neumann-Shannon, 346, 355
DNA, 154, 157, 165, 193, 199 entropy capacity, 405, 408, 415
double-quantum-dot model, 293 entropy increase
dynamical systems, 223, 307, 319, 321 in quantum measurements, 362
518 Index
environment, 11, 363-364, 376, 414, 459, fluctuation-dissipation theorems, 397

462, 466, 478 fluctuations, 226
interaction with, 11 Fock-Krylov theorem, 486
environment-induced superselection, 363, forcing structures, 166-167, 169-170, 172-
414-415, 417-418, 463 173
environment-induced superselection rules, formal logic
495, 503 interpretation of, 498
environmental decoherence, 64 formalism, 68
environmental sciences, 426 foundation of statistical physics, 477,
EPR correlations, 17, 55, 349, 352, 367- 482, 484-485, 487
368, 376, 455, 507, 510 Fredkin gate, 276-277
EPR experiment, 387, 509-510 free energy
EPR paradox, 510 with minimum algorithmic infor-
e-machine, 230 mation, 203
equations free information, 234
Liouville, 351, 409, 412 frequency-locking, 262
master, 409-410, 413, 415 Friedmann models, 68
nonlinear SchrOdinger, 353 Friedmann-type models, 417
Pauli, 413 frozen components, 167
pre-master, 409
quantum Lionville, 412
G
telegrapher's, 315
van Hove, 413 Gabor, 405
von Neumann, 412 game vs. reality, 16
Wheeler-DeWitt, 68 gas
Wigner distribution function, 351 classical ideal, 351
ergodic theory, 224 Gauss distribution, 324
estimation, 121-122 general purpose computer, 274
ether, 382 generalized states, 232
event horizon, 53 generating partition, 231
Everett interpretation, 405, 407, 416-417, genetic information, 194
430 genotype, 153
Everett's conjecture, 33 as defined by fringe shifts, 6
evolution, 4, 17, 194, 454 geometrodynamics, 17
evolvability, 151-155 Gibbs ensemble, 94-95, 348
existence, 3-4, 8, 10, 15, 17 Gibbs entropy, 129
exponential sensitivity Gibbs phase rule, 145
to initial conditions, 320 Gibbs-Shannon statistical information,
92, 94, 96
Gibbs states, 361
F global maximum likelihood, 123
field theory grammar, 256
quantum, 334, 336 grammatical inference, 230
fine-grained history, 434 graph complexity, 234
fine-grained set of histories, 447 graph indeterminacy, 233
fine graining, 447 gravitation, 383-384
finite automaton, 223 gravitational field, 383
fission, 13 Griffiths' histories, 495, 499, 508
fitness landscape, 151, 153 G5del's undecidability, 76
fixed-length coding theorem, 94
Index 519
information (cont'd.)
H loss of, 463
H-theorem, 360 metric, 80
Hartle-Hawking path integral, 68 mutual, 30-31, 33, 141, 144, 244
Hawking-Bekenstein formula, 67 mutual algorithmic, 99, 101
Hawking formula, 51 mutual statistical, 97
Hawking radiation, 50 physics, 408
heat capacity, 248 prior, 91, 94, 100, 103, 107, 112
Heisenberg spins, 337 processing, 289, 299
hidden variables, 275 processing rate, 68
hierarchical structure, 291 Shannon, 236
history, 498 Shannon's theory of, 74, 80
horizon, 54 statistical, 93-94
Huffman's coding, 95-96, 195 storage device, 294
human brain, 62, 64-68, 331 transmission rate, 35
Huygens' principle, 56 visual, 331
hypercube architectures, 290 information-carrying sequences, 199
information-theoretic triangle inequality,
17
I information theory, 3, 5, 8, 11-12, 17, 29,
IGUS, 74-75, 453 96-97, 127, 129-131, 133, 225, 232, 332,
incomplete information, 358 346
inderterminism initial condition, 426
quantum, 413-414 instantaneous code, 95, 100-101
indexed grammar, 256 interference fringes, 6
inequalities, 99-108 intermittency, 262, 319, 321, 325-327
inertial observer, 54 interpretation of formal logic, 498
inflation, 460 interpretation of quantum mechanics, 33,
inflationary universe, 460 495
information, 193-194, 196-197, 223, 359, irreversibility, 145, 357, 452, 484, 486-487
382, 390, 405, 508 quantum, 478
absolute algorithmic, 100, 102 Ising model, 337
accessible, 32 1-D kinetic, 297
algorithmic, 93, 97, 100, 107, 112, Ising spin system, 262
141, 199, 378 Ising spins, 337
available, 129 isothermal expansion, 353
Bekenstein-Hawldng, 67 it from bit, 3, 5, 7-8, 11-12, 16
compressibility of, 83
conditional algorithmic, 98, 100-
101, 106
conditional statistical, 96-97 Jaynes' maximum entropy principle, 360
correlation, 33 joint algorithmic information, 98, 101
dissipation of, 459-460 joint algorithmic randomness, 78
distance, 80 joint statistical information, 97
dynamics, 316 Josephson junctions, 326
economy of, 45 Josephson weak link, 501
free, 234
genetic, 194
Gibbs-Shannon statistical, 92, 94, K
96 K-flows, 264, 503, 508
joint algorithmic, 98, 101 K-systems, 509
520 Index
Kelvin's efficiency limit, 112-113 Lyapunov exponent, 260, 320, 324, 327
Kholevo's theorem, 29, 31-32, 34-36
Kolmogorov entropy, 411, 508
Kolmogorov-Sinai entropy, 321 M
Kraft inequality, 85, 95, 120 machine table, 293
macromolecules
complex, 293
L
macroscopic quantum effects, 478
labeled directed graphs, 230 magnetic flux, 5
Lamb-Retherford experiment, 394 magnetometer
Lamb shift, 384, 393-394, 396-397, 400 and it from bit, 6
in classical mechanics, 395 Manneville-Pomeau map, 320
Landauer's principle, 113 many-worlds interpretation, 33, 472
Laplacean demon, 414 Markov chain, 246, 325-326
Larmor radiation law, 396 Markov partition, 241
laser pulse, 292 Markov process, 509
latent complexity, 223, 237 master equations, 409-410, 413, 415
lattice dynamical systems, 262 mathematics, 65-66, 68
lattice gas, 314 unreasonable effectiveness of, 64
law of initial conditions, 68 maximal device, 359
laws, 62-63, 66, 68 maximal sets of decohering histories, 445
Larmor radiation, 396 maximum entropy, 124, 234, 389-390
mechanics, 65 maximum uncertainty/entropy principle,
nature, 303 360
physics, 9, 65, 67, 301 Maxwell electrodynamics, 10
learning theory Maxwell's demon, 73-74, 81-82, 85, 93,
formal, 230 106, 109-112, 405-406
Levy distribution, 324 meaning, 13-16
lexicographic tree, 79 measurement, 231, 290, 362-365, 451
light cone, 346, 350 measurement entropy, 358-359, 365
Lindenmayer systems, 223, 255 measurement problem, 363-364
Lionville equation, 351, 409 measurement process, 291, 406
quantum, 412 measurement situations, 451
local entropies, 414 measurement theory, 506
local measurements, 39 measurement/preparation process, 358
local synchronization, 279 measuring device, 358
local transition table, 297 measuring instrument, 230
locality, 64 membrane, 201
localized charge-transfer excitations, 291 semipermeable, 347
logic metric complexity, 236
structure of, 15 metric entropy, 225, 236
logical depth, 142, 229 microcanonical ensemble, 100, 104-105
logical functions, 295 entropy of, 105
logical interpretation, 495-496 Mind Projection Fallacy, 385
logistic map, 231 miniaturization, 289
lognormal distribution, 212 minimal description, 79
loop minimal program, 97, 129, 152
observer-participancy in, 8-9 minimum description length criterion,
Lorentz invariance, 312 121
Lorentz-invariant model of diffusion, 313 Misiurewicz parameters, 241, 262, 264
loss of information, 463 mixed states, 358
Index 521
Occam's Razor, 228, 232, 305, 389, 118,

model, 132 142
model construction, 117 ontogeny, 165
modeling, 117, 226 open Einstein universe, 57-58
models operators
1-D kinetic Ising, 297 irreducible set of, 4
double-quantum-dot, 293 optimal coding, 95
Lorentz-invariant, 313 optimal precision, 121
molecular chaos, 411 oracle, 100-101
molecular electronics, 289 random, 225
multistability, 290, 292 order
mutation, 156, 165 from random inputs, 133
mutual algorithmic information, 99, 101 ordered energy, 346
mutual information, 30-31, 33, 141, 144, degradation of, 346
224 organization, 61-62
mutual statistical information, 97 orthogonal states, 347
division by a semipermeable wall,
N 353
operational meaning of, 347
natural selection, 152 Ou-Mandel experiment, 369, 372
near-minimal program, 131
nested stack automaton, 256
neural networks, 290 P
neurons, 337 Paley-Wiener theorem, 486
Newton's laws, 63 parallel architecture, 290
no-cloning theorem, 355 parallel computation, 273, 278
noise deterministic, 273
as inputs, 131 fully parallel, 274
noiseless channel coding theorem, 85 parallel computer architecture, 290
non-computable mathematical descrip- parallel computers, 278
tions, 66 parallel-processing systems, 154-155
non-computable, 65-66 participatory universe, 5
non-orthogonal states, 352, 355 pathological science, 15
distinguishing, 352 pattern recognition, 17
selection of, 352 Pauli equation, 413
non-unitarity, 406 percolation, 169-173
nonequilibrium sequences distributions, period-doubling cascades, 223
200 phase transitions, 215, 223
nonexponential decay to equilibrium, 321, phenotype, 153
325 photon
nonlinear Schradinger equation, 353 done, 5
nothingness, 8 intangible status of, 5, 14
NP-completeness, 226 physical complexity, 226
number continuum, 9 physical constraints, 289
Nyquist thermal noise, 397 physical entropy, 73, 87
physical limits, 285
0 physics
continuum view of, 11-12
observer, 73-74 of communication, 29
observer-participancy, 14, 16 Planck area, 6, 10, 50
observer-participant, 4, 8, 13-15 Planck era, 64, 420
522 Index
Planck length, 6, 10 quantum ensemble, 349

Planck time, 67 quantum field theory, 334, 336
Platonists, 68 quantum fluctuations, 291
Poincare recurrence theorem, 486 quantum gravity, 66, 405, 416-417, 459,
Poincare recurrences, 505 463
polarization, 40 quantum indeterminism, 413-414
polymers, 199 quantum limits on computation, 273
pre-master equation, 409 quantum Lionville equation, 412
prediction, 438 quantum logic, 503
prefix code, 120 quantum measurement, 291, 358, 460
prefix-condition code, 95 quantum mechanics, 33, 67, 357, 375,
prefix-free codes, 78 427, 460-463
principles causality in, 437
Copernican, 5, 13-14 fundamental formula of, 437
correspondence, 478 interpretation of, 495
Jayne's maximum entropy, 360 quantum multiplexing, 32
Landauer's, 113 quantum phenomenon, 3, 9, 15, 17
maximum uncertainty/entropy, quantum physics, 3
360 quantum state, 347
superposition, 353, 391 quantum state of the universe, 64
prior distribution, 121 quantum statistical mechanics, 58
prior information, 91, 94 quantum system, 350
prior knowledge, 118, 121 scattering of, 350
probability, 12, 91, 224, 381, 390 entropy of mixture, 354
single event, 132 mixture of, 353
probability amplitudes, 12 quantum theory, 3, 10-12, 357-358, 381,
probability assignment, 91-92 384-387, 390-392
probability theory, 385, 387-390 as derived from information the-
probability theory as logic, 388, 390 ory, 3, 7, 12, 17
problems quantum thermodynamics, 360-362
relativistic measurement, 367 quantum Tsirelson's inequalities, 477,
incomplete information, 358 479-480, 482-484
program quasiclassical domains, 445
core of, 133 quasiclassical operators, 449
Q R
Qb virus, 204 radiation
quantization effects, 289 relict microwave, 14
quantum, 4 random Boolean networks, 155, 162, 174
also see photon, 14 randomness, 224
quantum channels, 32 real amplitudes, 44
quantum-classical correspondence, 477 reality is theory, 13
quantum communication, 29 record, 451
quantum computation, 273 recurrent states, 234
quantum computer, 290 reduction
quantum correlations, 411, 478 in quantum measurements, 362
quantum cosmology, 63-64, 426, 459-461, of wave function, 406
463-467 reduction of the wave function, 406
quantum-dots, 289 regular language complexity, 235
quantum electrodynamics, 383-384 regular languages, 232
Index 523
Shannon's noiseless channel coding theo-

regularities, 62-63 rem, 86
relative state, 33 Shannon's theorems, 316
relative state interpretation, 33 Shannon's theory of communication, 85
relativistic causality, 349 Shannon's theory of information, 74, 80
relativistic measurement problem, 367 Shannon-Fano coding, 86, 95, 100
relevance, 409-411, 417 Shannon-McMillan theorem, 241
Renyi entropy, 234, 241, 250 Sinai-Kolmogorov entropy, 321
reproduction, 139 shift operation, 296
retrodiction, 438 signal detection theory, 333
reversible computation, 276 simplicity, 61
reversible logic, 275 sleepwalking, 15
Rindler wedge, 56, 58 Solomonoff-Kolmogorov-Chaitin complex-
RNA, 165, 204 ity, 141
rugged landscapes, 153, 177 spacetime, 3, 5, 9-10
quantum fluctuations in, 10
S with a horizon, 53
spacetime diagram, 281_
Schmidt canonical representation, 417 spacetime interval, 8 -•
Sc,hrOdinger's cat, 471 spectral decomposition, 224, 263
second law, 83, 362 spin networks, 378
second law in biology, 389 spin systems, 234
second law of thermodynamics, 62, 73, spontaneous emission, 384
75, 81, 83, 110, 140, 352-355, 360-362, sporadic behaviors, 323
405-406, 488, 490 sporadic process, 321-322
selection, 156 sporatic process, 322
selection rules, 293 stack automata, 223, 229
self-complexifying processes, 61 stationary process, 224
self-delimiting minimal programs, 78 statistical entropy, 73, 406
self-delimiting programs, 97-98, 129 statistical information, 93-94
self-organization, 137 Gibbs-Shannon, 94
self-organizing nonequilibrium processes, statistical mechanics, 224
200 statistical physics
self-organizing processes, 61 foundation of, 477, 482, 484-485,
self-reference, 9 487
self-similarity, 252 stochastic automaton, 230
semi-classical physics, 495 stochastic complexity, 120, 122-123
semiconductor heterostructures, 289 stochastic connection matrix, 233
semipermeable membranes, 347, 352 stochastic dynamics, 297
semipermeable partitions, 350-351 stochastic process, 292
semipermeable piston, 350 storage capacities, 294
sequence distribution, 200 string theory, 11
sequential architecture, 290 from information theory, 17
serial computation, 274 strong subadditivity, 364
serial computer, 278 structural complexity, 291
set theory, 377 structure
Shannon entropy, 321 complex hierarchical, 299
Shannon graphs, 232 subadditivity
Shannon information, 236 of entropy, 364
Shannon's coding theorem, 120, 122 sum-over-histories, 432
Shannon's information theory, 78, 80, 225 super-Copernican principle, 13-14
524 Index
superconducting ring, 501 theories (cont'd.)

supermachine, 9 Shannon's theory of communica-
superposition principle, 353, 391 tion, 85
superselection Shannon's theory of information,
effective rule, 504 74, 80
environment-induced, 363, 414- signal detection, 333
415, 417-418, 463 string, 11, 17
environment-induced rules, 495, visual perception, 331
503 Wheeler-Feynman absorber the-
superuniversality, 262 ory, 408
symbolic dynamics, 232, 234 theory of everything, 64-65
symmetry breaking, 296 thermal equilibrium, 355
Szilard's engine, 82-83, 109-112, 131, 412 thermal interactions, 361
multicylinder version, 84 thermodynamic depth, 142, 193, 196
thermodynamics, 265, 345
Thouless formula, 327
T time-energy uncertainty relations, 360,
telecommunications network, 16 365
telegrapher's equation, 315 time-ordering, 346
theorems as a primitive concept, 346
fixed-length coding, 94 topological entropy, 235
fluctuation-dissipation, 397 total excess entropy, 250
Fock-Krylov, 486 transient states, 234
H-theorem, 360 tree search, 106-108
Kholevo's, 29, 31-32, 34-36 information balance in, 108
no-cloning, 355 triangle inequality, 80
noiseless channel coding, 85 Tsirelson's inequalities, 478, 480, 482, 484
Paley-Wiener, 486 Turing machine, 65-67, 77, 225, 257, 290
Poincare, 486 turtles
Shannon-McMillan, 241 tower of, 4, 8
Shannon's, 316 two-slit experiment, 428
Shannon's coding, 120, 122
Shannon's noiseless channel cod-
U
ing, 86
Wigner's, 354 uncertainty, 360
theories unification, 64
algorithmic information, 96-97, unification program, 63
127, 129-131, 133 uniformly accelerated observer, 54
Bayesian probability, 92, 387, 392 universal computer, 76, 97, 274
coding, 78, 83, 85 universal systems, 279
communication, 92, 94 universal Turing machine, 225
computational ergodic, 264 universality, 237
ergodic, 224 universe
formal learning, 230 initial conditions of, 12
information, 3, 5, 8, 11-12, 17, Unruh effect, 54
29, 225, 232, 332, 346
measurement, 506
V
probability, 385, 387-390
quantum, 3, 7, 10-12, 17, 357- vacuum fluctuations, 392, 401
358, 381, 384-387, 390-392 value, 193-194, 197
quantum field, 334, 336 van der Waals attraction, 401
Index 525
van Hove equation, 413

van't Hoff's boxes, 201 Wheeler-Feynman absorber theory, 408
variety, 66 Wheeler-Feynman electrodynamics, 384
visual information, 331 Wiener processes, 484
visual perception, 331 Wiesner's quantum multiplexing, 32
theory of, 331 Wigner distribution, 502
von Neumann architecture, 290 Wigner distribution function, 351
von Neumann entropy, 352, 488 Wigner's friend, 471-472
von Neumann equation, 412 Wigner's theorem, 354
von Neumann-Shannon entropy, 346, 355 work, 361
wormholes, 467
W
Y
watchdog effect, 410, 414
wave function of the universe, 416, 419 yes-no questions, 3, 5-6, 11
wave function
Born's interpretation of, 508
Z
collapse of, 405-407, 413-416, 472
reduction of, 406 Zeno's quantum paradox, 410
of the universe, 416, 419 zero-entropy flows, 264
wave-packet reduction, 507 zero-point energy, 381, 384, 392-394
Weiss's sofic systems, 232 zero-point fluctuations, 393
Weyl-symbol, 502, 509 zeroth law of thermodynamics, 362
Weyl tensor hypothesis, 408 Zwanzig's projection procedure, 409, 411,
Wheeler-DeWitt equation, 68, 417, 464 413
Index of Names
Chomsky, N., 250

A Chopard, Bastien, 315
Adams, J.C., 193 Christodoulou, Dimitri, 6
Aharonov, Yakir, 5-6, 370, 431 Coleman, S., 427
Ahlers, G., 147 Cook, R.D., 122
Albert, D.Z., 370 Cooper, L., 430
Albrecht, D.G., 337 Copernicus, 13
Anandan, Jeeva, 6 Curie, Marie Sklodowska, 15
Anderson, Philip W., 16
Arnold, Vladimir, 307 D
Aspect, A., 368, 371
Atiyah, M., 10 Daneri, A., 452
Darwin, Charles, 152
Davies, P.C.W., 53
B de Arcangelis, L., 170
Bak, P., 187, 189 de Raad, 401
Barrow, H.B., 332, 337 Derrida, B., 170, 176
Barrow, John, 66, 68 Deutsch, David, 65, 360
Bauer, E., 429 DeValois, R.L., 338
Bayes, 12 DeWitt, B.S., 419, 430, 463
Bekenstein, Jacob, 6, 47, 50 Dicke, R.H., 452
Bell, J.S., 368, 376-377, 475 Diosi, L., 483
Benioff, P.A., 277 Dirac, 370
Bennett, Charles, 35, 74, 79, 82-83, 93, Dyson, F.J., 394
109, 138, 202, 229, 415
Bernoulli, James, 382, 387-388, 390
E
Bernstein, S.N., 486
Bialek, William, 331-332 Einstein, Albert, 4, 17, 224, 346-347, 368,
Bohm, David, 5, 55, 475 371-372, 383, 385-387, 390, 449, 477
Bohr, Niels, 4, 8, 14, 385-386, 394, 501 Ellis, J., 467
Boltzmann, 79 Ellis, Leslie, 388
Boole, George, 388 Everett III, Hugh, 33-34, 376, 430, 472
Borel, E., 409
Bowen, 321
Bretthorst, G.L., 389 F
Brillouin, L., 405 Feferman, Solomon, 9
Feller, W., 388
C Feynman, Richard, 64, 81, 273, 276-278,
385, 387, 391, 394, 397, 432, 444, 456
Caldeira, A.O., 443 Finkelstein, D., 378
Campbell, F.W., 338 Fisher, M.A., 323, 325
Cartan, Elie, 4 Fisher, R.A., 12, 388
Caves, Carlton, 79, 85 Fludd, Robert, 4
Chaitin, G.J., 74, 79, 118, 133, 321 Flyvbjerg, H., 164
528 Index of Names
Fallesdal, D., 13, 15

Ford, Joseph, 63 K
Forster, D., 450 Kadanoff, L., 450
Fredkin, Edward, 67, 276, 377 Kanitscheider, B., 15
Fukuyama, T., 449, 465 Kauffman, Stuart A., 176
Fulling, S.A., 53 Keifer, C., 449, 465
Kemeny, J., 118
G Kendall, D.G., 325
Kendall, M.G., 388
Gaspard, P., 322, 326 Kepler, J., 4
Gell-Mann, Murray, 88, 407, 430-431, Khalfin, L.A., 481
463, 467 Kheyfets, A., 10
Geroch, Robert, 66, 430 Khintchin, A.I., 85
Gibbs, J.W., 79, 390 Kholevo, A.S., 29, 31
Giddings, S., 427 Kiefer, C., 464
Ginibre, J., 502 Kjaer, N.J., 164
Glauber, Roy, 392 Kolmogorov, A.N., 74, 118, 224-225, 229,
Griffiths, R.B., 376, 430, 438, 463 263, 320-321
Griffiths, R.J., 497, 505 Koppel, M., 135
Gull, Steve, 389 Kris, John, 5
Gadd, Kurt, 68 Kuffler, S.W., 337
Kuhn, Thomas, 302
Landau, L., 429
H
Hagedorn, G., 502-503
Halliwell, J.J., 417, 426, 449, 461, 463, L
465 Landauer, R., 35
Hamming, R.W., 66, 85 Landauer, Rolf, 67, 82, 109
Hardy, J., 314 Langton, Christopher, 188, 263
Hartle, James, 66, 418-419, 426, 431, 463, Laplace, Pierre Simon de, 382, 387-388,
466-467 390
Hawken, M., 337 Larmor, 400
Hawking, Steven, 6, 47, 418-419, 427, Lawden, M., 338
463, 466 Le Verrier, W.C.J., 193
Heisenberg, W., 13, 429, 506 Lebovitz, J., 450
Helmholtz, 400 Leggett, A.J., 443, 501, 503
Hepp, K., 502 Leibniz, G.W., 8, 66
Herken, Rolf, 9, 135 Levin, L.A., 79
Hofstadter, D., 81 Levin, S., 176
Houtermans, Fritz, 13 Lifshitz, E., 429
Huffman, D.A., 95 Lloyd, Seth, 67, 142
Hbel, 337 London, F., 429
Lorentz, 400
Lubkin, E., 415
.1
Jaffee, S., 164
M
James, William, 8
Jaynes, E.T., 92 Maassen, H., 33
Jentzsch, R., 326 Mandel, L., 367
Johnsen, Sonke, 186 Mandelbrot, Benoit, 319
Joos, E., 11, 430, 443, 449, 463 Mann, Thomas, 8
Index of Names 529
Margenau, H., 371

MargoIns, Norman, 285, 314 Poincare, 400
Popper, K., 14
Marr, D., 337
Martin, P., 450 Power, A., 394, 397
Maxwell, J.C., 400
Meiss, J., 326 Q
Mellor, F., 465
Milton, 401 Quine, Willard Van Orman, 9
Misinrewicz, M., 241
Misner, C.W., 463
Mohanty, S., 467
Morikawa, M., 449, 465 Ramanujan, S., 322
Moss, I.G., 465 Rayleigh, 400
Mott, N.F., 11 Renninger, M., 452
Mukhanov, V.F., 430 Renyi, A., 234
Rissanen, J., 228
Ritz, W., 408
N Rivest, Ronald, 316
Nanopoulos, D.V., 467 Robson, 338
Newton, I, 4 Rosen, Nathan, 17
Neyman, J., 388 Royce, Josiah, 15
Nicolis, Gregoire, 321, 325 Ruelle, David, 321
Ruffini, Remo, 6
Rutherford, 8
0
Obermayer, K., 292 S
Omnes, Roland, 375, 378, 430, 438, 463
Oppenheimer, J.R., 386 Saxon, D.S., 12
Ornstein, D.S., 224, 263 Schlieder, S., 370
Ott, E., 326 Schradinger, E., 385, 390
On, Z.Y., 367 Schumacher, Benjamin, 378
Schwarz, G., 121
Schwinger, J., 370, 394, 401
Segerstedt, T., 13
Packard, Norman, 262 Shamir, Adi, 316
Padmanabhan, T., 465 Shannon, C.E., 29-30, 74, 78-79, 85, 346,
Pagels, Heinz, 67, 142 359
Park, D., 371 Shimony, A., 370
Parker, A., 337 Sinai, Ja.G., 225, 320-321
Parmenides of Elea, 15 Skilling, John, 389
Partovi, M.H., 346, 351 Smolin, Lee, 66
Pauli, Wolfgang, 4, 381, 386, 391-392, Smoluchowski, M., 81, 405
406 Solomonoff, R.J., 74, 118
Peierls, R.B., 429 Spiegelman, S., 204
Peirce, Charles Saunders, 15 Stauffer, D., 170, 174
Penrose, Roger, 6, 67-68, 81, 379, 408 Stephens, C.R., 56
Peres, Asher, 370, 406, 463 Streete, Thomas, 4
Picard, R.R., 122 Strominger, A., 427
Podolsky, Boris, 17 Stueckelberg, E.C.G., 12
Poggio, 337 Szilard, L., 81-82, 405
530 Index of Names
T w
Teich, W.G., 292 Wald, R.M., 55
Telegdi, V., 463, 467 Walden, R.W., 147
Thorne, Kip, 6 Wang, Xiao-Jing, 322, 325-326
Tipler, Frank, 68 Weaver, W., 85
Toffoli, Tommaso, 276, 310, 314-315 Weinberg, Steven, 392
Tomonaga, 370 Weisbuch, Gerard, 170
Tsirelson, B.S., 478, 481 Weisskopf, V.F., 394
Welton, T.A., 394, 397
Weyl, Hermann, 4, 9
U Wheeler, John A., 10, 13, 67, 367, 377,
Uffink, J.B.M., 33 382, 394, 430, 463
Unruh, W.G., 11, 54, 56, 431, 454, 463 White, Morton, 15
Wigner, Eugene, 64, 376, 386, 431, 463
Wold, H.O.A., 224, 263
V Wolfram, Stephen, 142, 229, 262
VanVechten, D., 430 Wootters, William, 5, 12, 17
Velo, G., 502 Wright, Chauncey, 15
Venn, John, 388
Vernon, J.R., 444
Vilenkin, A., 463
von Mises, R., 388 Zee, A., 340
von Neumann, J., 346-348, 353, 358, 371, Zeh, H.D., 11, 430, 443, 449, 463, 465-466
497 Zurek, Wojciech H., 5-6, 17, 93, 95, 98-
von Schelling, F.W.J., 15 99, 104-105, 110, 112, 129, 205, 229, 430,
443, 463, 483
Zvonkin, A.K., 100
Zwanzig, 409, 411, 413

Wojciech H. Zurek - Complexity, Entropy and The Physics of Information-Westview Press (1990)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wojciech H. Zurek - Complexity, Entropy and The Physics of Information-Westview Press (1990)

Uploaded by

Copyright:

Available Formats

COMPLEXITY, ENTROPY,

AND THE PHYSICS OF

Wojciech H. Zurek, Editor

ARP Advanced Book Program

CRC Press is an imprint of the

First published 1990 by Westview Press

Library of Congress Cataloging-in-Publication Data

The Santa Fe Institute (SFI) is a multidisciplinary graduate research and teach-

All titles from the Santa Fe Institute Studies

Volume Editor Title

Volume Editor Title

L. M. Simmons, Jr., Chair

Dr. Robert McCormick Adams

COMPLEXITY, ENTROPY, AND THE PHYSICS OF

Complexity, Entropy, and the Physics of Information, SFI Studies in

also conceivable that approximately reversible computer "chips" can be realized

can be concisely described as "5 • 104 01 pairs." By contrast, no concise de-

Information from Quantum Measurements

Local Accessibility of Quantum States

The Entropy of Black Holes

Some Simple Consequences of the Loss of Information in a

Why is the Physical World so Comprehensible?

II Laws of Physics and Laws of Computation 71

Algorithmic Information Content, Church-Turing Thesis,

Complexity, Entropy, and the Physics of Information, SFI Studies in

Entropy and Information: How Much Information is

Laws and Boundary Conditions

How to Define Complexity in Physics, and Why

Ill Complexity and Evolution 149

The Dynamics of Complex Computational Systems

Computation at the Onset of Chaos

IV Physics of Computation 271

Parallel Quantum Computation

Information Processing at the Molecular Level:

How Cheap Can Mechanics' First Principles Be?

Intermittent Fluctuations and Complexity

Information Processing in Visual Perception

V Probability, Entropy, and Quantum 343

Thermodynamic Constraints on Quantum Axioms

Entropy and Quantum Mechanics

Quantum Mechanics and Algorithmic Complexity

Probability in Quantum Theory

Quantum Measurements and Entropy

VI Quantum Theory and Measurement 423

Quantum Mechanics in the Light of Quantum Cosmology

The Quantum Mechanics of Self-Measurement

Information, Physics, Quantum:

113 Copyright 0 1990 by John Archibald Wheeler.

Complexity, Entropy, and the Physics of Information, SFI Studies in the

1. QUANTUM PHYSICS REQUIRES A NEW VIEW OF REALITY

The super-Copernican principle.

2. "IT FROM BIT" AS A GUIDE IN THE SEARCH FOR LINK

(phase change around perimeter of the included area)

Here h = 1.0546x 10-27gcm2 /s is the quantum in conventional units, or in geometric

p = Blit = (flux per unit z)

FIGURE 1 Symbolic representation of the "telephone number" of the particular one

of special relativity or Einstein geometrodynamics, and for a simple reason: Time

RI See MTW," page 1217, and Wheeler.'"

act of amplification," such as the click of a photodetector or the blackening of a

FOURTH AND LAST NO

[10)see Wheel rtisi pages 41-42, and Wheeler,132 pages 397-398.

(number of bits) =(Iog2 e) x (number of nats)

FIFTH AND FINAL CLUE