Professional Documents
Culture Documents
Algorithmic Information Theory: G. J. Chaitin
Algorithmic Information Theory: G. J. Chaitin
Algorithmic Information Theory: G. J. Chaitin
Chaitin
Historical introduction
350
G. J . CHAITlN
~ 3 1
Definitions
Let us start by considering a class of Turing machines
with the following characteristics. Each Turing machine
has three tapes:a program tape, a work tape, and an
output tape. There is a scanning head on each of the
three tapes. The program tape is read-only and each of
its squares contains a 0 or a 1. It may be shifted in only
one direction. The work tape may be shifted in either
direction and may be read and erased, and each of its
squares contains a blank, a 0, or a 1. The work tape is
initially blank. The output tape may be shifted in only
one direction. Its squares are initially blank, may have a
0, a 1, or a comma written on them, and cannot be rewritten. Each Turing machine of this type has a finite
numbern of states, andis defined by an n X 3 table,
which gives theactiontobe
performedand the next
state as a function of the current state and the contents
of the square of the work tape that is currently being
scanned. The first state in this table is by convention the
initial state. There are eleven possible actions: halt, shift
worktapeleft/right,
write blank/O/l on worktape,
read square of program tape currently being scanned and
copy onto square of work tape currently being scanned
and then shift program tape, write O / 1/comma on output tape and then shift output tape, and consult oracle.
The oracle is included for the purpose of defining relative concepts. It enables the Turing machine to choose
between two possible state transitions,dependingon
whether or not the binary string currently being scanned
on the work tape is in a certain set, which for now we
shall take to be the
null set.
From each Turingmachine M of this type we define a
probability P, an entropy H , and a complexity I . P ( s ) is
the probability that M eventually halts with the string s
written on its output tape if each square of the program
tape is filled with a 0 or a 1 by a separate toss of an unbiased coin. By string we shall always mean a finite
binary string. From the probability P ( s ) we obtain the
entropy H (s) by taking the negative base-twologarithm,
i.e., H ( s ) is -log,P(s). A string p is said to be a program if when it is written on M s program tape and M
starts computing scanning the first bit of p, then M eventually halts after reading all of p and without reading any
other squares of the tape. A program p is said to b e a
minimal program if no other program makes M produce
the same output and has a smaller size. And finally the
complexity I ( s ) is defined to be the least n such that for
JULY 1977
351
ALGORITHMIC INFORMATIONTHEORY
chine 'we mean the one that comes ith in a list of all possible defining tables in which the tables are ordered by
size (i.e., number of states) and lexicographically among
those of the same size. With this choice of Turing machine, P , H , and I can be dignified with the following titles: P ( s ) is the algorithmic probability of s, H ( s ) is the
ulgorithrnic entropy of s, and Z(s) is the ulgorithmic in&wmution of s. Following Solomonoff [3], P ( s ) and H ( s )
may also be called the a priori probability and entropy
of s. Z(s) may also be termed the descriptive, programsize, or information-theoretic complexity of s. And since
P is maximal and H and I are minimal, the above choice
of special-purposeTuringmachine
showsthat P ( s ) 1
exp,(-2 length(s) - O ( l ) ) , H ( s ) 5 2 length(.s) +
O ( 1 ) , and I ( s ) 5 2 length(s) O ( 1 ) .
We have defined P ( s ) , H (s), and I ( s ) for individual
strings s. It is also convenient to consider computations
which produce finite sequences of strings. Theseare
separated by commas on the output tape. One thus defines the joint probability P ( s , , . . ., s,), the joint entropy
H (s,,. . ., sn), and the jointcomplexity I ( sl,.. ., x,) of an
n-tuple s,, . . ., s,. Finally one defines the conditional
probability P(t,, . . ., tmlsl,.. ., s,) of the m-tuple t,, . . ., t ,
given the n-tuple st,. .*,s, to be the quotient of the joint
probability of the n-tuple andthe m-tupledivided by
the jointprobability of the n-tuple. In particular P ( t l s ) is
defined to be P ( s ,t ) / P ( s ). And of course theconditional
entropy is defined to be the negative base-two logarithm
of the conditional probability. Thus by definition H ( s , t )
=H(s)
H ( t i s ) . Finally, in order to extend the above
definitions totupleswhosemembers
may eitherbe
strings or natural numbers, we identify the natural number n with its binary representation.
Basic relationships
We now review some basic properties of these concepts.
The relation
H(s,t)= H(t,s)
+ O (1 )
+ 0(1)
352
G. .I. CHAITIN
H(s,t)5 H(s)
+ H ( t ) + 0 (I )
statesthatthe
not less than the product of the probabilities of computing s and t , and follows from the fact that programs are
self-delimiting and can be concatenated. The inequality
O(1) 5 H(tls) 5 H ( t )
+ O(1)
+ O(1),
+ O(1).
+ Z,(t) + O ( 1 ) .
JULY 1977
In order to define the probability, entropy, and complexity of r.e. (recursively enumerable) sets of strings it is
necessary to consider unending computations performed
on our standard universal Turing machine U . A computation is said to produce an r.e. set of strings A if all the
members of A and only members of A are eventually
written on the output tape, each
followed by a comma. It
is important that U not be required to halt if A is finite.
The members of the set A may be written in arbitrary
order, and duplications are ignored. A technical point: If
there are only finitely many strings written on the output
tape, and the last one is infinite or is not followed by a
comma, then it is considered to be an unfinished string
and is also ignored. Note that since computations may
be endless, it is now possible for a semi-infinite portion
of the program tape to be read.
The definitions of the probability P ( A ), the entropy
H ( A ), and the complexity I ( A ) of an r.e. set of strings A
may now be given. P ( A ) is the probability that U produces the output setA if each square of its program tape
is filled with a 0 or a 1 by a separate toss of an unbiased
coin. H ( A ) is the negative base-two logarithm of P ( A ) .
And I ( A ) is the size in bits of a minimal program that
produces the output set A , i.e., / ( A ) is the least n such
that there is a program tape contents that makes U undertake a computation in the course of which it reads
precisely n squares of the program tape and produces
the set of strings A . In order to define the joint and conditional probability and entropy we need a mechanism
for encoding two r.e. setsA and B into a single set A j o i n
ALGORITHMICINFORMATION
353
THEORY
H(4,B) = H ( B , A )
+ O(1),
+ O(1),
H ( A ,B ) = H ( B , A ) + O(I ) ,
O(1 ) 5 H(B1A) 5 H ( B ) + O(I ) ,
I ( A , B ) 5 Z ( A ) + I ( B ) + O(1).
H(A)iH(A,B)
For singleton A , I ( A ) = H ( A )
G . J. CHAITIN
For initial A , I ( A ) = H ( A )
+ O(logH(A)).
Moreover, it is possibletoshow
thatthereare
infinitely many initial sets A for which Z(A) > H ( A )
O(logH(A)). This is thegreatest knowndiscrepancy
between I and H for r.e. sets. It isdemonstrated by
showing that occasionally the number of initial sets A
with H ( A ) < n is appreciably greater than the number
of initial sets A with Z(A) < n. On the other hand, with
the aid of a crucial game-theoretic lemma
of D. A. Martin,
Solovay [28] has shown that
I(A)i 3 H(A)
+ O(logH(A)).
H(n)
354
+ O(I ) .
There is also a close but not an exact relationship between H and I in the case of sets consisting of initial
segments of the set of natural numbers (recall we identify the natural number n with its binary representation).
+ O(1);
JULY 1977
+
+
355
356
G . J. CHAITIW
IBM J. RES.
DEVELOP.
Acknowledgments
JULY 1977
References
1. M. L. Minsky, Problems of Formulation for Artificial Intelligence, Mathematical Problems in the Biological Sciences, Proceedings of Symposia in Applied Mathematics
X I V , R. E. Bellman, ed., American Mathematical Society,
Providence, RI, 1962, p. 35.
357
ALGORITHMICINFORMATIONTHEORY
3513
G . J. CHAITIN
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
tion and Randomness by Means of the Theory ofAlgorithms, Russ. Math. Suru. 25, No. 6, 83 (1970).
L. A. Levin, On the Notion of a Random Sequence, Soviet Math. Dokl. 14, 1413 (1973).
P. Gat, On the Symmetry of Algorithmic Information,
Soviet Math. Dokl. 15, 1477 (1974). Corrections, Soviet
Math. Dokl. 15, No. 6, v ( 1974).
L. A. Levin, Laws of Information Conservation (Nongrowth) and Aspects of the Foundation of Probability
Theory, Prob. Info. Transmission 10,206 (1974).
L. A. Levin, Uniform Tests of Randomness, Soviet
Math. Dokl. 17, 337 (1976).
L. A. Levin, Various Measures of Complexity for Finite
Objects (Axiomatic Description) , Soviet Math. Dokl. 17,
522 (1976).
L. A. Levin, On the Principle of Conservation of Information in Intuitionistic Mathematics, Soviet Math. Dokl. 17,
601 (1976).
D. E. Knuth, Seminumerical Algorithms. The Art of Computer Programming, Volume 2 , Addison-Wesley Publishing
Co., Inc., Reading, MA, 1969. See Ch. 2, Random Numbers, p. 1 .
D. W. Loveland, A Variant of the Kolmogorov Concept
of Complexity, Info. Control 15,510 (1969).
T. L. Fine, Theories of Probability-AnExamination of
Foundations, Academic Press, Inc., New York, 1973. See
Ch. V, Computational Complexity, Random Sequences,
and Probability, p. 1 18.
J. T. Schwartz, O n Programming: An Interim Report
on
the SETLProject.Installment
I : Generalities, Lecture
Notes, Courant Institute of Mathematical Sciences, New
York University, 1973. See Item 1, On the Sources of
Difficulty in Programming, p. 1 , and Item 2, A Second
General Reflection on Programming, p. 12.
T. Kamae, On Kolmogorovs Complexity and Information, OsakaJ. Math. 10, 305 (1973).
C. P. Schnorr, Process Complexity and Effective Random
Tests, J . Comput. Syst. Sci.7, 376 (1973).
M. E. Hellman, The Information Theoretic Approach to
Cryptography, Information Systems Laboratory, Center
for Systems Research, Stanford University, April, 1974.
W. L. Gewirtz, Investigations in the Theory of Descrip5,
tive Complexity, CourantComputerScienceReport
Courant Institute of Mathematical Sciences, New York
University, October, 1974.
R. P. Daley, Minimal-program Complexity of Pseudorecursive and Pseudo-random Sequences, Math.Syst.
Theor. 9,83 (1975).
R. P. Daley, Noncomplex Sequences: Characterizations
and Examples, J . Symbol. Logic 41, 626 ( 1976).
J. Gruska, Descriptional Complexity (of Languages) -A
Short Survey, MathematicalFoundations of Computer
Science 1976, A. Mazurkiewicz, ed., Lecture Notes in
Computer Science 45, Springer-Verlag, Berlin, 1976, p. 65.
J. Ziv, Coding Theorems for Individual Sequences, undated manuscript, Bell Laboratories, Murray Hill, NJ.
R. M. Solovay, A Model of Set-theory in which Every Set
of Reals is Lebesgue Measurable, Ann.Math.
92, 1
(1970).
R. Solovay and V. Strassen, A Fast Monte-Carlo Test
for Primality, SIAM J . Comput. 6, 84 (1977).
G. H. Hardy, A Course of Pure Mathematics, Tenth edition, Cambridge University Press, London, 1952. See
Section 2 18, Logarithmic Tests of Convergence for Series
and Integrals, p. 417.
M. Gardner,A Collection of Tantalizing Fallacies of
Mathematics, Sci. Amer. 198, No. 1,92 (1958).
B. Russell, Mathematical Logic as Based on the Theory of
Types, From Frege to Godel: A Source Book in Mathematical Logic, 1879-1931, J. van Heijenoort, ed., Harvard
University Press, Cambridge, MA, 1967, p. 153; reprinted
from Amer. J . Math. 30, 222 (1908).
ReceivedFebruary 2, 1977;revisedMarch9,1977
TheauthorislocatedattheIBMThomas
Research
Center,
Yorktown
Heights,
New
York
10598.
J . Watson
359
JULY 1977