0423.001. A Calculation of The Probability of Spontaneous Biogenesis by Information Theory - HUBERT P. YOCKEY

I. theor. Biol.
(1977) 67, 377-398
A Calculation of the Probability of Spontaneous

Biogenesis by Information Theory
HUBERT P. YOCKEY
Army Pulse Radiation Facility,

Aberdeen Proving Ground, Maryland 21005, U.S.A.
(Received 10 November 1975, and in revised form 16 August 1976)
The Darwin-Oparin-Haldane “warm little pond” scenario for biogenesis

is examined by using information theory to calculate the probability that
an informational biomolecule of reasonable biochemical specificity, long
enough to provide a genome for the “protobiont”, could have appeared in
lo0 years in the primitive soup. Certain old untenable ideas have served
only to confuse the solution of the problem. Negentropy is not a concept
because entropy cannot be negative. The role that negentropy has played
in previous discussions is replaced by “complexity” as defined in inform-
ation theory. A satisfactory scenario for spontaneous biogenesis requires
the generation of “complexity” not “order”. Previous calculations based
on simple combinatorial analysis over estimate the number of sequences
by a factor of 105. The number of cytochrome c sequences is about
3.8 x 10sl. The probability of selecting one such sequence at random is
about 2.1 x 10ee6. The primitive milieu will contain a racemic mixture
of the biological amino acids and also many analogues and non-biological
amino acids. Taking into account only the effect of the racemic mixture
the longest genome which could be expected with 95% confidence in
lo9 years corresponds to only 49 amino acid residues. This is much too
short to code a living system so evolution to higher forms could not get
started. Geological evidence for the “warm little pond” is missing. It is
concluded that belief in currently accepted scenarios of spontaneous
biogenesis is based on faith, contrary to conventional wisdom.
1. Introduction
Currently accepted scenarios concerning the origin of life are based on the
Darwin-Oparin-Haldane “warm little pond” concept in which nucleotides,
amino acids and all the basic compounds necessary to life are thought to
have been formed by chemical and physical processes during a period of
chemical evolution (Calvin, 1967; Haldane, 1928; Oparin, 1924, 1957). The
“warm little pond” may have been the whole ocean. According to this
scenario these components assembled and disassembled into polymers
378 H. P. YOCKEY
(Kaplan, 1974) and from this milieu the first object which could be regarded
as living appeared by chance. Once this object, or “protobiont”, was formed
it found itself in an enormous nutrient culture which was consumed by its
progeny in perhaps several hundred million years more or less and the
primitive earth pullulated with organisms. The period of organic evolution
began near the end of this time as the exhaustion of the primeval nutrients
approached and an era of predation and photosynthesis appeared. Thus the
conditions for biogenesis vanished and the possibility of a second appearance
of life was eliminated. Some aspects of life such as the chirality of amino
acids (Miller & Orgel, 1974), the selection of those coded and the universality
of the genetic code have been said to reflect a “frozen accident” due to a
single origin event (Crick, 1968; Ohno, 1973).
This scenario has been discussed in detail by Shklovskii 8c Sagan (1966)
and by Miller & Orgel (1974) and the general philosophy was given by
Simpson (1964). According to him, “Virtually all biochemists agree that life
on earth arose spontaneously from nonliving matter and that it would almost
inevitably arise on sufficiently similar young planets elsewhere”. It is hoped
that the reader is familiar with Simpson’s paper with regard to this and
other points to be discussed which will not necessarily be referenced
specifically.
It is the purpose of this paper to use information theory to calculate the
probability that a protobiont genome could have arisen from randomly
assembled polymers. Previous authors (Kaplan, 1974) have used simple
combinatorial analysis which, in addition to other faults, does not take into
account the fact that all amino acids are not equally probable. Results will
be presented pertaining to specific examples and, of course, these mathem-
atical methods can be applied by the reader to other forms of the spontaneous
generation scenario which may arise. In order to put the problem in
mathematical form we rephrase Simpson’s comment as follows. “All bio-
chemists, except for a set of very small probability, are virtually certain
(i.e. at least 95% certain) that life on earth arose spontaneously from non-
living matter and that it also would arise on a sufficiently similar young planet
elsewhere.” Accordingly, any scenario which is to survive our analysis must
have a probability of ~0.95 or greater and if it consists of n steps or system
components each on the average must have a probability of (0.95)““.
The question, “What is truth?’ has been asked but not answered (John
18 : 38). The theory of probability has no answer either but it does present us
with two choices. If we observe an event which we believe to be highly
improbable we may, (a) congratulate ourselves for being so fortunate, (b)
disbelieve or distrust the basic assumptions. For example, if we see a tossed
coin come up heads ten times, either we have witnessed a very rare event
SPONTANEOUS BIOGENESIS BY INFORMATION THEORY 379
(probability 2-r’ = l/1024), or the event is expected because the coin is
two headed. If the test is successful 32 times we may be the ecstatic witnesses
of an event whose probability is 2.33 x IO-‘. All scientists and other practical
men, except for a set of very small probability, would, however, be virtually
certain that the coin is two headed even without examining it. By the same
token the conclusion that life arose by a very lucky accident only once in the
universe, on earth about 4 x 10’ years ago (Monod, 1971) begs the question
and must be rejected as a scientific explanation of the origin of life. A
rationalist will hardly use standards of credibility for scenarios dealing with
the origin of life less critical than those used to test other scientific hypotheses.
Research on the origin of life seems to be unique in that the conclusion has
already been authoritatively accepted (Simpson, 1964; Eigen, 1971). What
remains to be done is to find the scenarios which describe the detailed
mechanisms and processes by which this happened.
Chemists who concern themselves with the era of chemical evolution
believe (Simpson, 1964; Eigen, 1971) that once the ancient milieu contained
the building stones of life in the form of the biologically necessary compounds
in solution the spontaneous emergence, that is, emergence by chance, of life
was inevitable by some process resembling crystallization or chemical
reactions. However, stones do not spontaneously make stone walls and these
in turn do not inevitably a prison make, nor in New England, good neighbors.
Whether one wishes to believe that life is common in the universe or not the
crux of the problem is to show how fife could have or must have originated
on Earth and possibly on Mars.
2. On Order and Complexity in Crystallography and Biology

Many authors have attempted to relate the “order” as understood in
crystallography with the notion of genetic specificity, or order, in molecular
biology. Let us examine how or whether such “order” can lead to biological
specificity. When a physicist or a chemist thinks of order he has in mind for
the most part the structure of crystals and chemical compounds, or perhaps
coefficients of correlation in liquids, solids and the like. For example, there
are two alloys of Cu and Au in which, in a properly annealed crystal, the
constituent atoms arrange themselves on planes in particular lattice sites.
These alloys are CuAu and C&Au. This superstructure or “long range
order” is destroyed by an order--disorder transformation which occurs at a
well defined temperature called the “Curie temperature”. A “short range
order” exists above this point which reflects a tendency of atoms to have
unlike neighbors. Other manifestations of order or co-operative phenomena
which also have a Curie temperature are ferromagnetism and ferroelectricity.
380 H. P. YOCKEY
Order parameters of several sorts are introduced somewhat phenomeno-

logically to discuss these matters theoretically. The maximum order as
determined by, say X-ray diffraction, always occurs when all atoms are
neatly placed at their assigned sites with a minimum of “defects”. This order,
then, is a means of comparing intellectually the real crystal or real sample of
a substance with an ideal standard crystal which is a member of one of the
seven crystal systems, 32 point groups and 230 space groups allowed by
Euclidean geometry. This concept of order has an anthropocentric or perhaps
aesthetic character. A piece of glass however perfect and uniform from the
optical point of view has less order than a crystal. The anthropocentrism of
this view is further illustrated by comparing the order in the transcendental
numbers 71and e with order in fractions. The fractions have order since the
decimal expression repeats a sequence of digits indefinitely. The trans-
cendental numbers have no repeats or other correlations and many statistical
tests would not be able to distinguish them from entries in a table of random
numbers. Yet clearly n and e are not random numbers.
Order, as so conceived, may be very useful in crystallography and in other
fields but it will not lead to specificity in molecular biology. Polanyi (1968)
has pointed out that the orderliness of the crystal precludes the capacity to
function as a code. As we discussed previously (Yockey, 1974) the crystal
carries very little information. Attempts to relate the idea of “order” in a
crystal with biological organization or specificity must be regarded as a play
on words which cannot stand careful scrutiny. Informational macro-
molecules can code genetic messages and therefore can carry information
because the sequence of bases or residues is affected very little if at all by
physicochemical factors.
An uninvited guest (Schroedinger, 1955; du Notiy, 1947; Prigogine &
Nicolis, 1971; Gatlin, 1972; Prigogine, Nicolis & Babloyantz, 1972;
Vol’kenstein, 1973) at any discussion of the origin of life and of evolution
from the materialistic reductionist point of view, is the role of thermo-
dynamic entropy and the “heat death” of the universe which it predicts. The
universe should in every way go from states which are less probable to those
which are more probable. Therefore hot bodies cool; energy is conserved but
becomes less available to do work. According to this uninvited guest, the
spontaneous generation of life is highly improbable (Prigogine, Nicolis &
Babloyantz, 1972). The uninvited guest will not go away nor will the
biological evidence to the contrary notwithstanding.
Writers on this topic seem to be unaware that entropy is a property of the
probability distribution of which it is a function. Every probability distri-
bution has an entropy (Shannon, 1948; Khinchin, 1957). In physics and
chemistry it is always the entropy of the probability distribution in energy
which is discussed. Entropies calculated from different probability distri-
butions have nothing to do with each other. For example, there is nothing
in information theory which corresponds to free energy or temperature. A
sample of sodium chloride whatever its origin, whenever it was created,
anywhere in the universe will find its unique and proper crystal structure. To
the extent that this depends on an entropy it will be the entropy of the
probability distribution in the allowed energy states. The passenger pigeon,
however, was described by the ensemble of genetic messages in its genome.
The pertinent entropies are the frequency distribution of nucleotide bases
and amino acid residues into which these messages were translated. This
ensemble of genetic messages will never appear again. The efforts of Bernal
(1967), Prigogine, Nicolis & Babloyantz (1972, Eigen (1971) and others to
relate considerations which involve thermodynamic entropy to biology are
misdirected if only because they are using an entropy which is irrelevant to
the problem. We see, therefore, that the uninvited guest is a ghost. It can be
shown that the cell obeys the laws of thermodynamics quite well (Bremermann,
1967). The flow of energy in an organism is an important aspect of life, how-
ever, the genome functions as a control of this energy flow and in a sense
operates orthogonal to it. As far as the origin of life is concerned thermo-
dynamics and “heat death” of the universe are irrelevant. We must consider
genetic entropy not thermodynamic entropy.
In an attempt to deal with this uncomfortable situation, Schroedinger
(1955) introduced “negentropy” and this was further developed and applied,
mostly in physics, by Brillouin (1962). Entropy is regarded as a measure of
disorder and its negative therefore as a measure of order. A fatal objection
to negentropy is that the function which satisfies the mathematical require-
ments expressing the intuitive notion of disorder or uncertainty is not unique
unless it is positive (Khinchin, 1957). The highest state of order or certainty
is expressed when entropy is zero. According to this theorem there is no
such thing as negentropy. That is, negentropy as defined and used by
Schroedinger (1955), Brillouin (1962) and others is not unique and does not
obey even a one sided conservation law. Negentropy is like the virtual image
in a mirror. It may seem very real but actually nothing is there.
One seems to be frustrated in the attempt to penetrate this problem.
According to information theory the largest value of entropy corresponds to
the largest information capacity yet a large entropy correponds to disorder
or randomness. On the other hand some authors regard entropy as a measure
of lack of information. If this is baffling, it is because we are interpreting the
meaning of mathematical functions according to their names (Yockey,
1974). As we shall show presently, if one follows the mathematical argument
itself, the dilemma disappears.
382 H. P. YOCKEY
The other important and remarkably prescient idea introduced to biology

by Schroedinger (1955), namely, the incorporation of an “hereditary code-
script” in the chromosome viewed as an “aperiodic” crystal, survives in the
form of the genetic code and the sequence hypothesis. He was the first to
point out the enormous number of arrangements which could be achieved in
this way and which is more than sufficient to record the vast range of bio-
logical specificity. His book was written long before the role of nucleic acids
as the “aperiodic” crystal incorporating the “hereditary code-script” was
known. We will show presently that certain theorems in information theory
which have been proved during the last decade allow us to replace “negen-
tropy” by a valid concept. This concept depends on mathematical notions
of what is meant by “aperiodic” and by ordered arrangements.
Let us now return for a moment to the two-headed coin problem. Flipping
a coin will generate one of the sequences belonging to the ensemble of all
possible sequences of heads and tails. Each of these sequences has exactly the
same probability. On this basis we have no justification for regarding a series
of all heads or all tails as being at all unusual or statistically different from
any other sequence of heads and tails. Yet we feel intuitively that any sequence
which has order, that is, any sequence which exhibits a pattern or a periodicity
according to which a rule can be stated such that the sequence could be con-
tinued indefinitely, is different from others. Is there a mathematical basis for
this feeling and how is this related to distinguishing a random sequence from
an orderly non-random one? The fact that a sequence originates from a
series of probabilistic events is clearly not very helpful in making this
distinction.
An answer to this question has emerged from the initially independent
work of Kolmogoroff (1965a,b), Chaitin (1966) and Solomonoff (1964) on
the foundations of information theory and its relation to probability. The
germ of the idea is the observation that a long sequence of heads or tails can
be described by a rule or algorithm much shorter than the sequence. Suppose
the algorithm is “heads, n times”. If the number of heads, n, is written in
binary digits this message increases in length approximately as log, n,
whereas the message itself is n binary digits long. Suppose IZequals 32; then
log, 32 = 5.00 so we can specify a string of 32 heads or tails by a message
of 1 + 5 or 6 binary digits. The next more complicated sequences, namely,
HTHT- - - and THTH- - - require more information to specify HT or
TH. In general, the more complex the pattern the longer is the message
describing it. In the limit of complexity when there is no discernible pattern,
that is, when the sequence is “aperiodic” one must specify each symbo1 in
turn indefinitely. Such a message is just as long as the sequence it describes.
This idea does not depend on the origin of a sequence. In view of this we see
that the factor which distinguishes the all heads or all tails sequences from
the others is that these two are the most highly ordered, and they can be
described by the shortest algorithm. Therefore they contain the least amount
of information.
If a gambler believes he perceives a structure or order in a stochastic
sequence, even a probabilistic structure, he attributes this to a fault or
“unfairness” in the generator of the stochastic sequence and plans his wagers
accordingly. By the same token, if a scientist sees a pattern or a regularity
in his data he constructs a theory which allows him to describe his results in
a more compact form than simply listing them in a table. If the theory is a
great deal more compact than the data it is called a “law of Nature”. If the
theory is almost as complex as the data it is ad hoc even though the theory
“explains” the data perfectly. It often happens that a clever gambler or
scientist perceives a pattern in a short sequence. It is the Nemesis of both
that, as the sequence becomes longer, and if it is very complex or random,
such patterns are wiped out. As the sequence becomes longer the algorithm
undergoes frequent modification and embellishment until it becomes even-
tually as long as the sequence itself. We may, at least in principle, arrange the
ensemble of sequences of binary digits in a hierarchy according to the in-
formation content of the algorithm describing each. We now realize that the
sequences with the longest algorithms have the largest entropy and are the
most complex. Therefore they also have the largest information content. A
random sequence is the most complex sequence of all since we cannot predict
its future behavior on the basis of past performance. The algorithmic inter-
pretation of information theory provides a means of discussing and measuring
order in all degrees of complexities from a unified point of view.
Kolmogoroff (1965a,b) has called the entropy or information content of the
shortest algorithm describing a sequence the complexity of the sequence. The
vast majority of long sequences have a large entropy and are therefore nearly
as complex as a random one. There are only two algorithms which can be
expressed by one digit, 22 which can be expressed by two digits and so on.
The total number of algorithms shorter than n-k is
(2*+22+23+ . . . y-k-‘) = y-k-2.
The fraction 2”-k- 2/2” of algorithms have a complexity less than n-k
(Chaitin, 1975).
When these ideas are applied to considerations of the origin of life we
realize that we need an explanation not of the generation of order but rather
of complexity. Crystals are ordered ; informational biomolecules are
“aperiodic” as Schroedinger (1955) has said and therefore are complex. A
pursuit of the generation of order will end in crystallography not in biology.
384 H. P. YOCKEY
Complexity has a mathematical meaning and a measure. We thus avoid the

confusion found in discussions in ordinary language which are caused by the
fact that the words “order”, “information”, “knowledge”, “complexity”,
“random” etc. have several independent meanings. The discussion given
above is largely verbal, of course. However, the justification is found in the
references (Martin-Liif, 1966; Kolmogoroff, 1968; Chaitin, 1974a,b) which
are lumpy with theorems. The remarks given above summarize the work in
the references cited sufficiently for the present purposes. This topic is not as
simple as our brief review may imply. For example, it is possible to prove a
sequence is orderly by finding an algorithm substantially shorter than the
sequence. On the other hand it is impossible to prove that no such short
algorithm exists and that a given sequence is indeed random. This is related
to Ggdel’s incompleteness theorem (Chaitin, 19743). The reader who is
interested in pursuing this point further should consult the references cited
and it is suggested he begin with Chaitin (1975) and Kolmogoroff (1968).
3. The De Novo Appearance Probability of Informational Macromolecules

The above information-theoretic discussion puts us in a position to
consider the “warm little pond” scenario and to calculate the probability
that complexity will emerge and at what rate. We need a definition of life
and as Crick (1966) has pointed out this is hard to come by. Nevertheless,
the problem can be rendered more tractable if we note that the sequence
hypothesis and the central dogma provide us with a necessary if not sufficient
condition. According to the sequence hypothesis the significant properties of
biologically active proteins are determined by the exact sequence of amino
acid residues in the chain. Some residues are synonymous and so there is an
ensemble of homologous protein chains each of which is functionally
equivalent. We will recognize a necessary but not sufficient condition for the
emergence of life if the scenario under discussion can select at least one
message from an ensemble of messages of sufficient length and specificity.
The condition is not sufficient since we have yet to demonstrate that the
ensemble contains the particular genetic messages which constitute a genome.
Therefore, valid scenarios of the origin of life from nonliving matter must, at
least, show how at least one member of an ensemble of genetic messages,
that is, a genome, could have appeared by chance and natural causes from a
primeval milieu reflecting conditions believed to have existed on the primitive
earth. We will avoid the question of the code for the time being. Since the
force of natural selection is on the proteins we will take the usual approach
and consider first the probability that complexity could have been generated
in an ensemble of proteins. We could, of course, equally well assume the
SPONTANEOUS BIOGENESIS BY. INFORMATION THEORY 385
primeval existence of a code (not necessarily the modern one) and carry out
these considerations with respect to a nucleic acid.
It is possible to specify the time available rather more exactly than most
other aspects of the problem. The earth is believed to be about 4.6 x lo9 years
old. Fossil micro-organisms have been found in rocks formed about 3.3 x IO9
years ago (Schopf & Barghoorn, 1967; Muir & Hall, 1974). Allowing time
for the oceans and the atmosphere to form one has a generous allowance of
about 1 x lo9 years (Eigen, 1971; Shklovskii & Sagan, 1966; Miller & Orgel,
1974) during which valid scenarios must complete chemical evolution and
result in a successful generation of an ensemble of genetic messages coded in
informational biomolecules. Sagan & Drake (1975) allow their enthusiasm
to suggest this time is only several hundred million years.
Let us now consider the “warm little pond” scenario at the time when
chemical evolution is complete and “the phase of self organization” is ready
to begin. According to Shklovskii & Sagan (1966) and Eigen (1971) it is
reasonable to estimate that the primeval soup contained sufficient amino
acids to form x 104’ chains of 100 sites each or a total of = lO44 amino acid
molecules. Bar-Nun & Shaviv (1975) estimate 5.4 x 104’ amino acid mole-
cules. In order to favor the scenario we choose the larger number. We
imagine Lachesis as a deus ex machina who rolls her icosahedral dice for
lo9 years and each second arranges all x lO44 residues in sequences of 101
sites each. Reliance upon the goddess to perform this task sets aside for the
time being questions of prebiotic synthesis, chemical stability, polymer-
ization, etc. (Hulett, 1969; Miller 8c Orgel, 1974). Let us first ask the prob-
ability that, at least once, one of these sequences will be a member of a
modern protein family, namely cytochrome c. The total number of possible
sequences of 101 sites is
20101 = 2.535 x 10i31. (2)
The residues are not all equally probable and as discussed by Shannon
(1948) and by Khinchin (1957) the number of sequences of length N except
for a set of very small probability, is
aNH (3)
where
H= - jEl Pj lo& Pj (4)
and pi are the residue probabilities. Equation (3) reduces to equation (2)
when all p, are equal. It is normal practice to set a = 2 and H is then
measured in bits. A good approximation to pj for the “average” protein is
obtained if
Pj = rjPi (5)
T.B. 26
386 H. P. YOCKEY
where rj is the number of degenerate codons coding for residue j and Pi is

the codon probability. In order to reflect the low frequency of arginine one
arbitrarily assigns only two codons and if all remaining codons are of equal
probability, pi = l/57 (Yockey, 1974). Then,
H = 4.153 bits/residue. (6)
The number of sequences in a chain of length 101 is
24.153X1Oi = (24.153)‘O’ = (17.786)‘O’ = 1.8O67X 10’26. (7)
If the Pj typical of cytochrome c from column five of Table 3 of Yockey
(1977u) are taken, H = 4.1985 and the number of sequences is 4.49 x 10r2’.
The simple combinatorial analysis used by previous authors is clearly
inadequate. Information theory shows that, in this case, the actual number
of sequences is smaller than the total possible number by a factor of IO’.
Only a tiny fraction of these sequences will carry specificity.
Let us now calculate the number of sequences of N sites when the amino
acid residues are selected only from those which are synonymous at a given
site in a homologous protein family (Yockey, 1977u). If all residues were
equally probable the total number of such sequences is the product of the
number of residues which are synonymous at each site. However, as we have
shown above, the fact that the residues are not all equally probable must be
taken into account. The entropy of the probability distribution of the
synonymous residues at site 1 is
H, = - c Pj log2 P[i (8)
p+fi.
(9)
zPj
The summation in equation (9) is taken only over the synonymous residues.
The effective number of synonymous residues at site 1 is
p = N&. (10)
The number of different sequences which may be selected from the chain is
the product of N& for all sites (TINf,,).
The number of cytochrome c sequences and the probability of selection by
chance in one trial from a pool of amino acids containing only one optical
isomer can now be obtained. In the discussion of cytochrome c given
previously (Yockey, 1977u) the synonymous residues presently known by
experiment were listed at each site for all cytochrome c. Since the experi-
mental data are incomplete a similar list was prepared including those
residues which are predicted by a prescription (Yockey, 19776) based on
Grantham’s criterion of the chemical similarity of amino acids (Grantham,
1974).
This list is reported in Table 1 where the value of Nit, for each site in
cytochrome c for the case wherein the amino acid distribution is pj = rj/57
has been calculated. The number of different cytochrome c sequences is
38 x 1061. The probability that the dew ex machina will find a member of
the cytochrome c family in a given set of 101 rolls of her icosahedral dice is
2.1 x 10e6’. In the lo9 years allowed by geological evidence the goddess
will have performed 3.15 x 105* trials using the 1O44 amino acids. We must
imagine this exercise going on in all 10 * “acceptable planets” in the universe
(Shapley, 1958; Miller & Orgel, 1974) in order to have a reasonable expec-
tation of selecting at least once a member of the ensemble of 3.8 x lOh1
cytochrome c sequences in only ten of them. This result is sensitive to the pi
distribution which is assumed. For example, if all residues are equally
probable there are 6-O x 1O63cytochrome c sequences and the probability of
selecting one member of this ensemble is 2.4 x 10m6*.
It has not been previousIy noticed that chemical evolution would have
provided much material which the deus ex machina would have to discard.
Many of the lO44 amino acid molecules which would be formed by chemical
evolution are nonbiological (Evered, 1974; Brack & Orgel, 1975; Lawless &
Boynton, 1973). The incorporation of any nonbiological residues or
analogues, even a wrong optical isomer, in a protein would greatly reduce
or destroy its specificity. This effecr is most important in the case of the longer
amino acid sequences. A well established estimation of the frequencies of
compounds in the primitive milieu which can be incorporated in an amino
acid sequence is not available in the literature. We can, however, illustrate
the point by taking into account only the optical isomers. We assume
the pj as given in equation (5) for the amino acids which compose modern
protein and assign pi/2 for each optical isomer except for glycine which is
not optically active. In this example there are 39 kinds of amino acids. The
values of N&r calculated from equation (10) using the above values of the
frequencies of the amino acids and their isomers are listed in parentheses in
Table 1 for each site containing glycine. If glycine is not a synonymous
residue N,,, is unaffected. The number of sequences of cytochrome c is now
7.25 x 106’, the number of sequences for 101 sites, is 3*4x 10154. Therefore
the probability of selecting a member of the cytochrome c family with the
same optical isomers in a given set of 101 rolls of the icosahedral dice is
2.15 x 10mg4. Clearly lo9 years is far too short a time and the universe is far
too small for the goddess to select even one molecule of cytochrome c from
the primitive milieu. Therefore a belief that proteins basic for life as we know
it appeared spontaneously in the primitive milieu on earth is based on faith.
TABLE 1
Nelf for each site in cytochrome c
Residue
sitet Synonymousresidues$ N’.ff 0
glY 1GOO
asnaspser 2586
val ile serpro ala glu lys asnarg thr his gin 14.404
met leu g/y tyr (13.891)
glu ala asplys thr arg pro his gin asn ser lO@Il
lys asnseralathr val argpro gly gin glu his 11491
(10585)
14 sly l*OW
15 gly lys glu gln ala seraspthr arg pro his asn 11.014
“g-y:’
16 asplys thr asnarg his gin glu
17 ile leu val thr met tyr 5.310
18 phe 1GIO
19 ile val thr lys glu arg pro ala tyr gin met his Il.130
20 metgln thr glu gly serile leu asnarg pro val 14404)
his lys ala tyr (13.891
21 lYS a3 2QOO
22 cysala 1.OOOq
23 serala leuglu asnarg pro thr val g/y i/e 14404
his gin lys met tyr (13.891)
24 gln Ieu glu arg pro thr ala val met iie tyr his lys 11a768
25 CYS 1WO
26 his 1mo
27 thr gly ser 2.942
(2.889)
28 val cysglu gly ala leu ile arg pro thr phe tyr 16.205
his gin asn asp ser met (15.620)
29 glu aspgly ala gln arg pro thr his asn ser IO.106
(9.585)
lys asnala glu leu a.rg pro thr val his gin 15.301
ile tyr met ser gly (14.752)
31 gly ala asngin pro thr glu his ser 8.320
(7.913)
32 gly ala val glu arg pro thr his gZn 8.533
(84’W
33 lys gly pro ala asnserthr arg his gin asp gfu 11,014
(10-441)
34 . . . . . . . . . . . . .... .... .
3.5 lys gly serarg pro thr ala glu his gin asn asp 11.014
(10441)
36 thr val gln ile ght argpro ala tyr his lys met 11.130
37 t3lY 1.000
38 pro 1400
39 asnalaserpro thr gly his gin glu 8.320
(7.911)
40 leu l+MO
41 hisasngln sertyr trp phearg leu pro thr ala vai ile lys glu met 14.981
42 dY l+MO
TABLE l-continued
Residue
site? Synonymous residuest NL**d
43 leu ile phe val met 4.312

44 phe ile tyr val met Ieu 5.196
45 gly ser asn t/w 3.747
(3 596)
46 arg 1*ooo
47 Iys gin his thr arg g/u 5.742
48 gln thr ser asn 3.586
49 dY 1400
50 gln thr ser ~SIZ 3586
51 ala thr val pro 4.000
52 pro val glu gln asp ala ser arg thr gly his 11.994
asn lys (11.440)
53 gly ser I ,960
(1.980)
54 tyr phe 2wo
55 ser thr ala pro g/y his g/n 6.586
56 tyr ‘Ez
57 thr ser 1.960
58 ala asp asn thr glu lys arg pro his gly ser gin 11.014
(10441)
59 ala gly pro thr
(‘%)
60 asn I 400
61 lys ile ala pro thr val his met 7.396
62 asn ser arg gin lys ala asp gly pro thr his glu I 1.014
(10441)
63 Iys met ala ser arg pro thr his gin vu1gly glu IO.845
(10.335)
64 gly ala asn pro thr his gin glu ser 8,320
‘;‘;;;I
65 ile val gln pro thr arg met his lys
66 ile thr vat gln leu glu asn arg pro ala gly tyr 14.404
his Iys met ser (13,891)
67 trp 1.000
68 gly asn gin ala thr asp glu ser lys pro his arg 11.014
(10441)
69 gln asp asn tyr pro glu his thr arg ala valgly 13.578
lys met ser (12.930)
70 asp glu asn gin pro lys ala arg thr his 9441
71 thr asp asn val arg pro ala gly his gin Iys glu ser 11.994
(11440)
72 leu met phe de 3.316
73 met phe ser arg tyr asp his leu pro thr ala vu1 16.205
gly ile gln asn Iys glu (15.620)
74 glu ile val asp trp leu tyr ser lys gln arg pro thr ala phe his asn met 15.896
75 tyr phe 2.ooo
76 leu I mo
77 glu leu thr arg pro aIa vu1 ile tyr his gin lys met Il.768
390 H. P. YOCKEY
TABLE l-continued
- - --
Residue
site? Synonymous residuesf N’.n §
78 asn 1.000
79 Pro 1GOO
80 lYS 1ao
81 lYS lX@O
82 tyr phe 2.000
ile met val 2.649
ii: pro 1mo
dY 1300
ii: thr l*OOO
87 lYS 1GOO
88 met 1GOO
89 ile val ala ser arg pro thr gly his gin lys il.838
giu met (11.305)
90 phe 1.000
Q1 val ala thr pro gly his gin 6.735
32 z?zlY 4:E)
93 ile leu I-890
94 lys ser arg his gin asn glu pro thr ala g-125
95 lys ala gin pro thr val his met 7.318
96 lys pro ala glu asp thr asn ser his gin arg gly 11.014
(10.441)
97 glu gly thr ser ala asn gln asp lys pro his arg 11,014
(10441)
98 glu asp gin asn 4~ooo
99 au3 1.000
100 ala glu val asn gly gln thr lys arg pro his ser 11.091
(10585)
101 asp asn his gin g/u 5400
102 leu ile 1.890
103 leu ile val met 3.454
104 ala thr ser pro gly his gin 6.586
(6.305)
105 phe tyr 2.000
106 leu met ile 2.455
107 leu lys val glu arg pro thr ala ile tyr his gin met I I .768
108 lys asp gln glu thr ser ile asn his arg pro ala vaf 14.577
gly tyr met (13.915)
109 ala lys ser leu glu thr arg pro val gly ile his ghz 12.631
met (12.196)
110 thr cys ser lys ala arg leu pro vu1gly ile tyr his 16.205
gin am asp glu met (15.620)
t Numbering as in Dayhoff (1972).

$ From Yockey (1977a,b) prescription residues in italics.
$ Where a number appears in parentheses it reflects the effect of optical isomerism as
discussed in the text. All other values of N’ ercare independent of optical isomerism.
7 We have ignored ala at this site because of the S-S bond with cys at site 25 (Yockey,
19776).
It is a legitimate part of the “warm little pond” scenario to say that modern
protein families were derived by evolution from smaller and more primitive
ones. In order to see if such an evolution could get started, let us now cal-
culate the length of a genome which could be generated with a confidence
comparable to that required of other scientific hypotheses. The effective
number of kinds of biological amino acids including their optical isomers is
2H = 2”4fQ = 33*8823. The geometrical average of the number of synony-
mous residues at each site in cytochrome c is the 1Olst root of 7.25 x 1060
or 4.0047. The probability of selecting a sequence of length N which is a
member of a protein family of information content typical of cytochrome c
(Yockey, 1977~) is
4al47 N
-__
(11)
( 33.8823 ) *
The product of the number of trials and the probability of success at each
trial is equal to the Poisson parameter 1. Let N be the longest chain such that
the probability that the goddess will select a member of an homologous
protein family, at least once, is 0.95. For this probability A = 3. The value
of N is a solution of the following equation:
1o44
NX3x10’6 = 3
where 3 x lOI6 is the number of seconds in lo9 years. The integral value of
N which satisfies this equation is 63.
This scenario is obviously highly artificial. Let us make it more realistic
by assuming that all 1O44residues are coded from nucleic acids of length 3N
nucleotides and that there is a mutation probability a per nucleotide per year.
In Yockey (1974) we showed that the degeneracy of the genetic code allows
only 6,509 of the nine possible single base interchanges to code a new
residue. The number of trials in lo9 years in a nucleotide chain coding for
N residues is
3N6~509~~ 1o9
9
The product of the number of trials and the probability of success at one
trials is equal to A.
x 109c( = 3. (14)
Where 1 is again chosen such that the probability of at least one success in
lo9 years is O-95.
392 H. P. YOCKEY
The constancy of c1 in various taxonomies and over geological time is

controversial at present. We are not entering this controversy but rather we
examine the consequences if tl is constant. The final conclusion is sensitive
only logarithmically to CCAccording to Kimura & Ohta (1974) the mutation
rate in fibrin0 peptides is 9 x lo’-’ amino acids per year. This figure is
presumably close to the basic mutation rate. When we correct for the code
degeneracy a = l-24 x low8 per base per year. The integral value of N which
satisfies equation (14) is 49. Since we have neglected the effect of the non-
biological amino acids and the analogues this still represents a very optimistic
estimate of the genome size which could be generated by this scenario.
4. Discussion and Conclusions

In order to consider the spontaneous generation of life by chance we have
employed some new ideas about order and randomness from information
theory. All scientific data can be put in the form of a sequence or ensemble
of sequences. This includes the ensemble of sequences of nucleotides which
constitute a genome. Information theory shows that, whether a finite sequence
was generated by a series of stochastic events, or by an algorithm which has
an appropriately high information content is fundamentally undecideable. A
stochastic process can, indeed, generate an orderly sequence which may be
described by a very short algorithm and which has a small information
content. The neglect of this point is fatal to Monod’s philosophy of chance
and necessity (Monod, 1971). The informational biomolecules, i.e., enzymes,
etc. involved in the crucially important biochemical pathways are complex
in accordance with information theory meaning of this term (Kolmogoroff,
1965). A scenario for the origin of life must present a plausible means of
generating complexity, not order or negentropy.
Let us return to our necessary if not sufkient condition for the spontaneous
generation of lie from nonliving matter, namely, the generation of a genome.
The very optimistically large estimate of the genome size given above, which
might be generated according to the scenario, represents not a system but
rather only one protein which is very small indeed. It falls very far short of
the genome size of E. coli or even that of viruses. Organisms, even the proto-
biont, are composed of systems of informational biomolecules of high
specificity. Kaplan (1974) guesses that the protobiont genome controlled
twenty to forty species of functional proteins of seventy to 100 residues in
length and roughly the same number of nucleic acids. In order to control a
system the genome of the protobiont must have coded all these substances on
a single informational macromolecule. However, the scenario does not
generate even one molecule of the biopolymers of reasonable specificity
from which the nonlinear processes of evolution considered by Eigen (1971)
and by Prigogine, Nicolis & Babloyantz (1972) could start.
It has not been noticed previously that the probability of spontaneous
biogenesis depends critically on chemical evolution providing precisely the
right relative concentration of amino acids. Considerations of the formation
of sequences from only two to four amino acid residues pertain to high
polymer chemistry but not to the origin of life. Contamination of the primitive
soup by nonbiological amino acids, the racemic nature of the biological
amino acids and the analogues lowers drastically the appearance probability
of the longer chains. When these effects are taken into account the genome
of 49 residues will be considerably reduced. And, of course, the reproduction
of proteins by proteins is a violation of the central dogma.
Part of these difficulties may be avoided by assuming that the genome was
generated in a nucleic acid. The helical structure of these compounds would
select always one optical isomer for the same reasons that cause the two
forms to crystallize separately (Miller & Orgel, 1974). This means that a
genetic code of some sort must have existed from the very beginning of life
(Miller & Orgel, 1974). Carter & Kraut (1974) also made this suggestion and
further proposed that the helical structure was the basis of a fundamental
complementarity between RNA and polypeptides. This may have provided
some degree of coding but they claim no evident relationship with the modern
genetic code. This approach is not without difficulties of its own. For example,
S&ton, Lohrmann, Orgel & Miles (1968) report experimental evidence for
a preference for 2’-5’ linkage in polymerization studies. Other difficulties are
discussed by Miller & Orgel (1974). Woese (1972) and YEas (1974) have
suggested that translation of the primitive code was “a very error ridden
process”. If so, this decreases further the probability of spontaneous bio-
genesis since even if a protobiont genome appeared it might not have been
able to function or to reproduce adequately and would therefore have become
extinct.
With regard to the appearance of a single molecule of the cytochrome c
family, even the dew ex machina needs 1O36 “acceptable planets” with just
the right conditions for 10’ years. Since there are believed to be only lOso
elementary particles in the universe (Misner, Thorne & Wheeler, 1970;
Shklovskii & Sagan, 1966) there clearly cannot be enough “acceptable
planets” to provide Lachesis with the material to conduct her search. One
who finds the chance appearance of cytochrome c a credible event must have
the faith of Job.
Advocates of the “warm little pond” scenario have noticed before that
specific informational biomolecules are highly unlikely, namely, Quastler
(1964), Salisbury (1969), Kaplan (1974), Eigen (1971). Quastler (1964) and
394 H. P. YOCKEY
Kaplan (1974) have sought an escape from this dilemma by assuming many
more synonymous residues than the facts justify. Quastler (1964) guessed that
between two and seven invariant residues at the active site are required in
chymotrypsin. Kaplan (1974) guessed an average of five invariants at the
active site in a 100 site chain and an average of thirteen equally probable
synonymous residues on the covariant sites. Inspection of Table 1 shows
that since residues are not equally probable there must be between 14 and
15 synonymous residues to provide Ni,, = 13. Very few cytochrome c sites
meet this criterion. Cytochrome c has 27 invariant sites according to present
knowledge. The geometrical average of the number of synonymous residues
at the variable sites is the seventy-fourth root of 3.8 x 1061 or 6.794. Because
of the very fundamental function of the cytochromes (Yamanaka, 1973),
the histones and other proteins (Wooten, 1974; Rossmann, Moras & Olsen,
1974; Wickramasighe & Villee, 1975) which are believed to be of very ancient
and even precellular origin one cannot relax the specificity requirement
derived from cytochrome c. We must conclude that Quastler (1964) and
Kaplan (1974) greatly underestimated the information content of real
homologous protein families which must have existed in quantity 3.3 x lo9
years ago. This leads to an enormously large overestimation of the appear-
ance probability. The reader may substitute numbers which he regards
reasonable in the equations given here. It will be found that there is a funda-
mental inconsistency between the probability of the selection by chance of a
genome large enough to code a living system even of optimistic simplicity
and the biochemical specificity needed to carry out its biological functions.
The results of the calculations given in this paper do not agree with certain
widely held beliefs. “The further synthesis of the building blocks into the
macromolecules, especially nucleic acids and proteins, essential for life has
not yet been accomplished under realistically primitive conditions. Never-
theless, it is reasonable to assume that those steps, too, would occur deter-
ministically, inevitably, if given enough time under conditions likely to hold
on some primitive planets. It is also clear that there has been enough time for
the earth is now definitely known to be more than three billion years old,
and planets still older could well exist in this and other galaxies” (Simpson,
1964). “As a result of such instability, the nucleation of this functional
correlation (we may call it the origin of life) turns out to be an inevitable
event-provided favorable conditions of free energy flow are maintained over
a sufficiently long period of time,” (Eigen, 1971). Many other authors have
expressed similar beliefs (Shklovskii & Sagan, 1966; Miller & Orgel, 1974).
If the primitive soup ever really existed, traces other than the origin of
life, to explain which it was invented, are to be expected. Some of the primitive
proteins or other polymers which it describes would be insoluble. They would
SPONTANEOUS BIOGENESIS BY INFORMATlON THEORY 39s
have precipitated out and accumulated in very old sediments. According to
Brooks & Shaw (1973) massive sediments containing nitrogenous organic
compounds should have been found. Alternatively, metamorphosed sediment
containing nitrogenous cokes should have been reported. In fact, organic
matter in very old sediments is extremely short of nitrogen. Such a precipi-
tation would in any event deplete the primitive soup and depress the buildup
of amino acids and other components of informational biomolecules.
Hulett (1969) has pointed out that the physical and chemical processes
postulated to have been responsible for the formation of the amino acids,
purines, pyrimidines, etc. also degrade such compounds. The concentrations
will approach an equilibrium which may be very much smaller than the 1O44
amino acids on which Shklovskii & Sagan (1966) and Eigen (1971) based
their considerations or the value of 5-4 x 1041 amino acids estimated by
Bar-Nun & Shaviv (1975). One has less difficulty with this objection since the
probability of success of scenarios discussed above is sensitive only
logarithmically to the total number of biological building stones. One is
more concerned with concentration and concentration ratios. It is possible
to believe that situations similar to those in which salt beds were formed
existed on the primitive earth (Hulett, 1969; Miller & Orgel, 1974) and were
responsible for concentration of amino acids in evaporating pools. Hulett
concludes a discussion of the chemical problems in some detail with the
remark that the scenario may be incorrect and that it is certainly incomplete.
Miller & Orgel (1974) discuss the stability of prebiotic organic compounds
and conclude that the “warm little pond” was more likely a semi-frozen
sea. This is inconsistent with geological evidence that the oceans were not
frozen in the epoch 3 to 4 x 10’ years ago (Donn, Donn & Valentine, 1965).
The origin of life is a multidisciplinary field in which astrophysicists,
chemists, geologists, etc. as well as biologists have a role to play. The astro-
physicist must prepare a reliable description of the primitive earth in which
the “warm little pond” is thought to have appeared. This is part of the
scenario for the evolution and structure of stars and planets from the
beginning of the universe (Clayton & Woosely, 1974). The record of astro-
physicists in making predictions is most impressive but it is not perfect. A
few years ago one of the more well established subjects in astrophysics was
the theory of energy generation in the sun and other stars. The thermo-
nuclear processes which, according to established ideas, are the primary
source of energy must generate a certain flux of neutrinos at the earth.
Unfortunately, the measured flux is less than one fifth of the predicted value
and may be zero (Bahcall & Davis, 1976). “This situation has advanced in
the past years from being merely difficult to understand to being impossible
to live with” (Trimble & Reines, 1973). The “solar neutrino problem” is
396 H. P. YOCKEY
very deeply involved with the description of the evolution of stars and planets
(Clayton & Woosely, 1974). No one can know how revolutionary its final
solution will be. A number of other “backyard problems” in physics have
been solved only with a drastic revision of previously “well established”
notions (Bahcall & Davis, 1976). Until this and other questions such as
the “faint young sun problem” (Sagan & Mullen, 1972) are resolved, the
biologist is justified in entertaining a creative skepticism with regard to
current descriptions of the environment of the earth 4-3 x 10’ to 3.3 x 10’
years ago. The neutrino was originally an ud hoc assumption to save the
principle of conservation of energy in p decay. It remained so for many
years until it was finally detected. The “warm little pond” scenario was
invented ad hoc to serve as a materialistic reductionist explanation of the
origin of life. It is unsupported by any other evidence and it will remain arl
hoc until such evidence is found. Even if it existed, as described in the
scenario, it nevertheless falls very far short indeed of achieving the purpose
of its authors even with the aid of a dew ex machina. One must conclude
that, contrary to the established and current wisdom a scenario describing
the genesis of life on earth by chance and natural causes which can be
accepted on the basis of fact and not faith has not yet been written.
It is a pleasure to acknowledge the courtesy extended by Dr Margaret 0. Dayhoff

who supplied a preprint of the sequences of cytochrome c to be published in
Supplement 2 of the Atlas of Protein Sequences and Structure.
REFERENCES
BAHCALL, J. N. & DAVIS, JR. R. (1976). Science,N. Y. 191,264.
BAR-NUN, A. 8c SHAVIV, A. (1975). Icarus Za, 197.
BERNAL, 3. D. (1967). The Origin of Life. Cleveland and New York: The World Publishing
co.
BRACK, A. & ORGEL, L. E. (1975). Nature, Land. 256,383.
BREMERMANN, H. (1967). In Progress in Theoretical Biology, vol. 1. New York and London:
Academic Press.
BRILLOUIN, L. (1962). Science andlnformation Theory, 2nd edn. New York: Academic Press.
BROOKS, J. & SHAW, G. (1973). Origin and Development of Living Systems. London and
New York: Academic Press.
CALVIN, M. (1967). In Evolutionary BioZogy (T. Dobzbansky, M. K. Hecht & W. C. Steere,
eds), vol. 1. New York: Appleton Croft.
CARTER, C. W., JR. & KRAUT, J. (1974). Proc. natn. Acad. Sci. U.S.A. 71, 283.
CHAITIN, S. J. (1966). J. ACM. 13, 547.
CHAITIN, S. J. (1975). Sci. Am. 232,47.
CHAITIN, S. J. (1974~). IEEE Trans. Znfo. Theory IT-20, 10.
CHAITIN, S. J. (19743). J. ACM. 21, 403.
CLAYTON, D. D. & WOOSELY, S. E. (1974). Rev. Mod. Phys. 46,755.
CRICK. F. H. C. (1966). Of Molecules and Men. Seattle and London: University of
Washington Press, -
CRICK, F. H. C. (1968). J. molec. Biol. 38, 367.
DAYHOFF, M. 0. (1972). Atlas of Protein Sequencesand Structure, 5. Nut. Biochem. Rex
Found. Georgetown Univ. Med. Center, Wash., D.C.
DONN,W. L., DONN,B. D. &VALENTINE, W. G. (1965).Geol.SOC. Am. Bull. 76,287.
EIGEN,M. (1971).Naturwissenschqften 58,465.
EVERED, D. F. (1974).Nature, Lond. 252, 388.
GATLJN,L. L. (1972).In Proc. Sixth Berk. Sym. Math. Stat. Berkeley: University of
CaliforniaPress.
GRANTHAM, R. (1974).Science, N.Y. 185, 862.
HALDANE, J. B. S. (1928).Rationalist Annual 148, 3.
HULETT, H. R. (1969).J, theor. Biol. 24, 56.
KAPLAN,R. W. (1974). Rad. Environ. Biophysics 10, 31.
KHINCHIN,A. F. (1957).Mathematical Foundations of Information Theory. New York:
Dover.
KIMURA,M. & OHTO, T. (1974). Proc. natn. Acad. Sci. U.S.A. 71, 2848.
KOLMOOOROFF, A. (1968).IEEE Trans. Info. Theory IT-14, 662.
KOLMOGOROFF, A. (1965a).Info. Trans. 1, 3.
KOLMOGOROFF, A. (19656). Problemy peredaci informacii 1, 3.
LAWLESS, J. G. & BOYNTON, C. P. (1973).Nature, Land. 243, 405.
LXIHRMANN, R., SULSTON, J., ORGEL, L. E. & MILES,H. T. (1968).J. molec. Biol. 37, 151.
MARTIN-Lii~(1966).Info. Cont. 9, 602.
MILLER,S.L. & ORGEL, L. E. (1974).The Origin of Life on Earth. EnglewoodCliffs, N.J.:
Prentice-Hall.
MISNER,C. W., THORNE, K. S. & WHEELER, J. A. (1970).Gravitation. San Francisco:
W. H. Freeman.
MONOD, J. (1971).Chanceand Necessity. New York: Alfred A. Knopf.
MUIR,M. D. & HALL,D. 0. (1974). Nature, Lond. 252, 376.
DU Noijy, LECOMPTE (1947).Human Destiny. New York: David McKay Co., Inc.
OHNO,S. (1973).Nature, Lond. 244,259.
OPARIN, A. (1957).The Origin of Life. NewYork: AcademicPress.
OPARIN, A. (1924).Proiskhozhdenie Zhizni. Izd. MoskovskyRabochy.
POLANYI, M. (1968).Science, N. Y. 160, 1308.
PRIGOGINE, I., NICOLIS, G. & BABL~YANTZ, A. (1972).Physics Today 25, 23, 38.
PRIGOGINE, I. & NICOLIS, G. (1971).Q. Rev. Biophys. 4, 107.
QUASTLER, H. (1964). The Emergence of Biological Organization. New HavenandLondon:
Yale UniversityPress.
ROSSMANN, M. G., MORAS, D. & OLSEN, K. W. (1974).Nature, Lond. 250, 194.
SAGAN, C. & MULLEN, G. (1972). Science, N. Y. 177, 52.
SAGAN, C. & DRAKE, F. (1975). Sci. Am. 232, 80.
SALISBURY, F. (1969).Nature, Lond. 224, 342.
SCHOPF, J. W. & BARGHOORN, E. S. (1967). Science, N. Y. 156, 508.
SCHROEDINGER, E. (1955). What is Life? Cambridge:CambridgeUniversityPress.
SHANNON, C. (1948).In The Mathematical Theory of Communication. Urbana, Ill.: The
Universityof Illinois Press.
SHAPLEY, H. (1958).Of Stars and Men. Boston:BeaconPress.
SHKLOVSKII, I. S. & SAGAN, C. (1966).Intelligent Life in the Universe. New York: Dell
PublishingCo.
SIMPSON, G. G. (1964).Science, N. Y. 143, 769.
SOLOMONOFF, R. J. (1964).Info. Control 7, 244.
SULSTON, J., LOHRMANN, R., ORGEL, L. E. & MILES, H. T. (1968).Proc. natn. Acad. Sci.
U.S.A. 59, 726.
TRIMBLE, V. & REINFS, F. (1973). Rev. Mod. Phys. 45, 1.
VOL’KENSTEIN, M. V. (1973). Usp. Fiz. Nauk 109; Soviet Phys. Usp. 16, 207.
WICKRAMASJGHE, R. H. & VILLEE, C. A. (1975). Nature, Lond. 256, 509.
WOESE,
C. R. (1972). In Exobiology (C. Ponnamperuma,
ed.). Amsterdamand London:
North-HollandPub. Co.
398 H. P. YOCKEY
WOOTEN, J. C. (1974). Nature, Lond. 252, 542.

YAMANAKA, J. (1973). Space Life Sci. 4,490.
YEAS, M. (1974). J. theor. Biol. 44, 145.
YOCKEY, H. P. (1974). J. theor. Biol. 46, 369.
YOCKEY, H. P. (1977a). J. theor. Biol. 67, 345.
YOCKEY, H. P. (19776). J. theor. Biol. 67 337.

0423.001. A Calculation of The Probability of Spontaneous Biogenesis by Information Theory - HUBERT P. YOCKEY

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

0423.001. A Calculation of The Probability of Spontaneous Biogenesis by Information Theory - HUBERT P. YOCKEY

Uploaded by

Copyright:

Available Formats

I. theor. Biol.

(1977) 67, 377-398

A Calculation of the Probability of Spontaneous

Army Pulse Radiation Facility,

(Received 10 November 1975, and in revised form 16 August 1976)

The Darwin-Oparin-Haldane “warm little pond” scenario for biogenesis

2. On Order and Complexity in Crystallography and Biology

Order parameters of several sorts are introduced somewhat phenomeno-

The other important and remarkably prescient idea introduced to biology

Complexity has a mathematical meaning and a measure. We thus avoid the

3. The De Novo Appearance Probability of Informational Macromolecules

where rj is the number of degenerate codons coding for residue j and Pi is

43 leu ile phe val met 4.312

t Numbering as in Dayhoff (1972).

The constancy of c1 in various taxonomies and over geological time is

4. Discussion and Conclusions

It is a pleasure to acknowledge the courtesy extended by Dr Margaret 0. Dayhoff

WOOTEN, J. C. (1974). Nature, Lond. 252, 542.

You might also like