Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

IS THE GENE THE UNIT OF SELECTION?

IAN FRANKLIN* AND R. C. LEWONTIN


Department of Biology and Department of Mathematical Biology, Uniuersity of Chicago,
Chicago, Illinois 60637
Received January 29, 1970

HE models of population genetics, which have remained almost unchanged


for forty years, are most commonly criticized for ignoring the “natural” unit
of selection, the genotype, in favor of the gene. This criticism is really an attack
on one of the basic assumptions of population genetics theory, namely that the
genotypic array in a random mating population, and evolutionary changes in that
array, can be described in terms of gene frequencies at the individual loci.
Undoubtedly there is a complex relationship between an individual genotype
and its fitness, and some population geneticists feel that complexity p e r se destroys
the usefulness of classical theory. But additivity (in the statistical sense) is not
a crucial assumption- in 1918 FISHER showed how to partition additive and
nonadditive effects of genes, and changes in gene frequency can be expressed in
terms of these effects. On the other hand, if there is enough nonallelic interaction
to induce stable correlations in allelic state between loci, the genotypic array can
no longer be usefully described in terms of gene frequencies alone. since the
frequency of a particular gametic type, say AbC, would not be simply the product
of the separate gene frequencies p A ,pb, pa.
These correlations in allelic state between loci in the gametic pool commonly
referred to as linkage disequilibrium (LEWONTIN and KOJIMA1960) are the
principal subject of this paper.
For many years loci have been assumed to be in approximate linkage equilib-
rium, and this belief has in part been justified by exact two-locus theory.
LEWONTIN and KOJIMA (1960), and BODMERand PARSONS (1961) showed that
for symmetrical fitness models there will be stable linkage disequilibrium if there
is considerable epistasis or very close linkage. KARLINand FELDMAN (1969) did
show, however, that for certain special relations among the fitnesses, looser link-
age may also lead to a stable disequilibrium. Moreover KIMURA(1965) claimed
that for weak interaction and loose linkage, FISHER’S fundamental theorem of
natural selection and WRIGHT’S concept of an adaptive topography hold approxi-
mately (see also WRIGHT1967), although this does not hold true for tight linkage
and strong selection. Since it seemed a priori that only a small fraction of all pairs
of loci would be closely linked or strongly interacting, and polymorphic, the
classical concepts of population genetics were not seriously threatened.

lThe research described here was supported by the Atomic Energy Commission Contract AT(ll-1)-1437.
Present address: Division of Animal Genetics, Commonwealth Scientific and Industrial Research Organization.
P.O. Box 90, Epping, New South Wales, 2121 Australia.

Genetics 65: 707-734 August 1970.


708 I. FRANKLIN AND R. C . LEWONTIN
However, a number of recent observations suggest that linkage disequilibrium
may be more common than previously envisaged.
1. The apparent ubiquity of polymorphic loci in a variety of organisms implies
a high density of segregating loci per map unit, so that many pairs of polymorphic
loci must be closely linked. For example, PRAKASH, LEWONTIN and HUBBY (1969)
estimate that 40% of all structural genes are polymorphic in Drosophila pseudo-
obscura. If there are 5,000 loci in this species, a conservative estimate, and a total
map length of 250 centimorgans, there will be 8 polymorphic loci per centi-
morgan. Taking into account the lack of recombination in males, the average
recombination fraction between adjacent polymorphic loci is .0006.
2. Some kinds of natural selection generate very large amounts of epistasis.
LEWONTIN(1964b) showed that various forms of selection for an intermediate
optimum phenotype created sufficient epistasis to produce stable linkage dis-
equilibrium even for genes on different chromosomes. Also the models of natural
selection discussed by KING (1967) and SVED,REEDand BODMER (1967) induce
much more epistasis than multiplicative models.
3. Linkage disequilibrium may arise from causes other than selection, in par-
ticular finite population size (HILLand ROBERTSON 1968; OHTAand KIMURA
1969; SVED1968). Even neutral alleles that are segregating may show consider-
able linkage disequilibrium if they are sufficiently tightly linked. The correlation
in gene frequency for segregating neutral loci appears to be approximately equal
to 1/4Nr where r is the recombination fraction between the loci in question, and
N the population size (HILLand ROBERTSON 1968). Also, departures from random
mating, such as positive assortative mating, may induce correlations in gene
frequency.
4. LEWONTIN (1964a,b) showed that genes quite far apart on the chromosome
will be held together in linkage disequilibrium by genes segregating between
them. That is, the disequilibrium between loci I and 2 and the disequilibrium
between loci 2 and 3 will result in a disequilibrium between loci I and 3 even
though, in the absence of locus 2, these distant genes would not be correlated.
However, genes on either side of a long interval within which there is no inter-
acting locus will not be in linkage disequilibrium with each other.
Finally, there is the finding to which this paper is directed, that two-locus
theory seriously underestimates the intensity of linkage disequilibrium between
loci in a multilocus segregation. We will show that epistatic interactions that are
small for any pair of loci considered alone can accumulate nonadditively in such
a manner that loci far apart on the chromosomes can be in marked linkage dis-
equilibrium. In fact, it is possible for all the loci on an entire chromosome or
chromosome arm to be highly correlated in their allelic distribution.
This phenomenon is not to be confused with the cumulative effect discussed
by LEWONTIN (1964a,b). The early finding was that loci far apart on the chro-
mosome are held out of linkage equilibrium with each other by loci between them
on the map. This is a result to be expected from the simplest ideas of correlation.
The phenomenon explored in the present paper is quite different. Here, two
adjacent loci are held in much higher correlation when embedded in a chromo-
UNITS O F SELECTION 709
TABLE 1
Correlation between two adjacent loci (2 and 3) in a 5-locus model,*
compared with predicted correlation from 2-locus theory

Map distance Observed Predicted Ratio


.01 .967 ,916 1.06
.02 ,929 ,825 1.13
.03 .883 .721 1.22
.04 .823 .600 1.37
.05 .736 .4+7 1.64
.06 ,581 .200 2.91
.of3 .481 0 CO

.ow .367 0 CO

* From LEWONTIN
1964a, Table 9

some containing other loci interacting with them, than when they are considered
in isolation. Such a result does not follow from simple considerations of correla-
tion and arises from higher-order interactions that do not exist in the two-locus
case.
Some of the results presented in this paper can be anticipated from the results
of 5-locus calculations made by LEWONTIN (1964a). Table 1 shows the equilib-
rium linkage relations for different map distances, expressed as the correlation
between loci,* for a pair of adjacent loci (locus 2 and locus 3) in a 5-locus multi-
plicative fitness model (Table 9, LEWONTIN 1964a). The model has a symmetry
such that the correlation for two loci can be calculated from exact 2-bcus theory.
The last column gives the ratio of observed to expected. The table shows that
as the map distance between the loci is increased, the effect of embedding
adjacent loci in a multilocus chromosome grows greater and greater. The effect
becomes relatively most extreme for map distances above .0625when exact 2-locus
theory predicts no correlation. We see two effects then. The correlation between
loci is greater than expected from 2-locus theory and the critical map distance,
above which there is no effect of linkage, is increased, although only slightly.
It is the purpose of this paper to push these two observations much farther
by increasing greatly the number of loci and examining a variety of selection and
linkage models in an approach to a realistic model of the genome. The results
turn out to be rather surprising.

METHODS

While exact numerical computations of changes in genetic composition of a population are


possible with a small number of loci, a different attack must be used when dozens or hundreds of
loci are involved. With only two alleles per locus, there are 21 gametic types in an 1-locus model
which involves manipulation of a matrix of that order. For 30 loci there are approximately 109
gametic types. We have therefore resorted to Monte Carlo simulation for more than five loci.
Since we are not primarily interested in the effect of small population size, we have used as large
a population size as was reasonable given a limited available computer time. Our primary at-
* See section on Measures of Linkage Disequilibrium for calculation of the correlation.
710 I. F R A N K L I N A N D R. C. L E W O N T I N

tempt was to derive essentially deterministic results by minimizing the stochastic effects. Finite-
ness of population size does turn out to be of some interest, however.
Monte Carlo simulation experiments were carried out using a general purpose program
written for the IBM 7094 at the University of Chicago. This program consists of a main deck
written i n FORTRAN which is primarily responsible for input and output, and a set of sub-
routines written in assembly language concerned with mating, recombination, evaluation of
phenotype and selection. Each of these subroutines has a set of optional decks which allow
different mating schemes and patterns of selection. The options used to generate the results
given in this paper were chosen so that the simulated populations had the following properties:
1) Population size
Males chosen in sequence and females chosen at random from a n array in store are used to
generate progeny, which are then selected on the basis of their fitness. Progeny are generated
until a population size N (equal to the number of parents) is attained. Because males are chosen
i n sequence, they have virtually no variance in offspring number due to Poisson sampling, while
the females have such variance so that effective population size is nearly (4N) /3.
2) Recombination
Crossovers are generated with no interferencethe number of crossovers in a segment with
map length A morgans is calculated by sampling from a Poisson distribution with mean A, and
the position of each crossover is determined by sampling from a uniform distribution [0, A].
3) Selection
a) Multiplicative fitnesses
The fitness of a genotype is the product of the fitnesses at the individual loci; each locus is
assumed to have an identical effect on fitness. We assign to each heterozygote a fitness W,, and
to homozygotes O / O and 2 / 1 values W , and W',. Then an individual with n, loci homozygous
O / O , n2 loci heterozygous, and n, loci homozygous i/i will have a fitness W,nl Wen,W,n,. This
value is compared to a uniform random variable between 0 and W,n. If the genotypic value is
greater than the random variable, the individual is saved, if less, discarded.
b) Proportional selection
Following a model of KING (1967), which postulates a fixed proportion of the generated
individuals surviving, a score is constructed by adding the number of heterozygous loci in an
individual to a random normal deviate. The individual is saved if the phenotypic score is greater
than a truncation point computed each generation so as to save a proportion R of the population.
The random number is chosen from a normal distribution with mean 0, and variance equal to
C times the variance of the number of heterozygous loci in the initial population. The initial
+
heritability is therefore 1/(C 1).
Because the word size of the IBM 7094 is 36, it is convenient when dealing with a large
number of loci to consider a multiple of 36.
Pseudorandom sequences are generated by the power residue method using a base of 3l9. We
have tested the sequences for frequency of single integers, pairs, triplets, and quadruplets. No
autocorrelations have been detected.
For 5-locus results, direct numerical calculation of equilibria was made using the method of
LEWONTIN (1964a).
The original computer program, modified by us, a d the tests of random numbers were the
work of our former colleague PROFESSOR MADHO SINGHof the State College, Oneonta, N.Y.

RESULTS

Initial 36-locus models


An initial survey of a range of recombination values was made for a sym-
metrical overdominant multiplicative model with W , = 0.9, W e= 1, and W , =
0.9. This is strong selection at each locus and the mean fitness, Wof a population
at gene frequency equilibrium would be (0.95)36= .I577 if the loci were inde-
UNITS O F SELECTION 711

0 100 200 300 400 500


GENERATION
FIGURE 1.-Results of initial simulations with 36 loci. Curves 1 and 2: r = 0.0; curve 7:
r = ,005; curves 3, 4, 5, and 6: r = .0025, where r is recombination between adjacent loci.
Abscissa is generation number; ordinate is mean fitness.

pendently segregatmg. The starting population consisted of a set of 600 ( 2 N )


gametes chosen at random from a population with all gene frequencies equal to
0.5, and in linkage equilibrium. The effective population size is 400. Figure 1
shows Wplotted against time, in generations, for three different values of r, the
recombination fraction between adjacent loci. For r = 0.0, complete linkage,
there is a small but not remarkable increase in w
(curves 1 and 2). In contrast,
at r = .0025, there was a marked increase in win all four replicates (curves
3, 4, 5, 6), accompanied by a reduction in the number of gametic types repre-
sented. At a higher recombination value, r = .005 (curve 7), no increase in w
was observed. Some simulations were also carried out with free recombination
between loci ( r = 0.5) and Wremained at the theoretical level of 0.158 for
populations in linkage equilibrium. Apparently there is a n intermediate optimum
recombination fraction that minimizes the genetic load, maximizing fitness. How
can that be? The explanation is given in Table 2 which shows the gametic com-
position of the populations at equilibrium. Each line of 0's and 1's is the repre-
sentation of the 36 loci along the chromosome for one gametic type in the gamete
pool at equilibrium. 'The frequencies are ten generation averages after 100 genera-
tions of equilibrium. For I = 0.0, the entire population consists of a very few
(3 and 5) gametic types each in fairly high frequency. No two of these makes a
perfectly balanced pair in the sense that there would be heterozygosity at every
locus if only those two gametic types were in the population. The complete set
of 3 (rep 1) or 5 (rep 2) gametic types does, however, make a balanced set since
712 I. FRANKLIN A N D R. C . LEWONTIN

TABLE 2
Equilibrium gametic arrays from initial simulations
Each line is a gametic type. 0 and I are alternate alleles. Asterisks mark fixed loci.
T h e 36 loci are spaced in 12 groups of 3 for ease of reading only.
~~

Gamete Frequency

(a) r = 0.0
* *
rep 1 101 OOO 011 001 100 000 111 011 101 110 101 101 .454
010 110 111 010 011 101 001 000 110 101 110 010 .391
100 111 110 101 011 011 100 100 010 100 011 111 .155
rep2 001 010 101 100 011 001 101 100 101 110 010 010 .280
110 011 000 111 001 110 011 010 110 110 100 000 .I72
011 100 110 001 100 111 111 110 000 011 101 101 .346
000 011 101 011 111 001 100 011 001 OOO 011 101 .024
100 011 011 101 110 101 OOO 010 111 OOO 011 111 .178

(b) r = .0025
rep 1 ** * *
(genera- 011 010 110 011 000 110 101 011 011 110 001 101 .411
tion300) 100 101 001 010 111 001 010 101 100 000 110 010 .424
others .165
rep 2 *
(genera- 001 100 100 011 101 011 111 100 100 111 000 111 .44Q
tion420) 110 011 011 100 010 100 001 011 011 000 111 OOO .427
others .133

except for the two loci starred in rep 1, all the loci are segregating when all
types are in the population. Since no two gametes make a balanced pair, no indi-
vidual in the population is a complete or nearly complete heterozygote. All
individuals, even those heterozygous for gametic types, are homozygous at many
loci (a minimum of 10 in rep 1) so the mean fitness of the population is not
raised much despite the tremendous restriction in the number of gametic types.
This phenomenon is a result of the finiteness of the population. Since there is
absolutely no recombination, each gametic type is an “allele” and random drift
causes the elimination of all but a few “alleles.” If the “alleles” do not include
a perfectly balanced pair, there is no way to recover such a pair.
When there is a small amount of recombination ( r = .0025), recombination
between the “alleles” can occur to produce perfectly balanced pairs (disregarding
loci completely fixed in the population). Table 2b shows such pairs making up
85% of the gametic array in each replicate. In rep 2, for example, 38% of the
population will be heterozygotes for the two main gametic types and therefore
heterozygous for 35 loci. w
will be high in such a population. Examination of the
.detailed output from the computation shows that the partial plateau for curve 5
shown in Figure 1 between generations 260 and 320 followed by the steep rise
to the new plateau of high fitness is the result of a partially balanced set of
chromosomes being replaced by a balanced set after a rare recombination gave
rise to such a set.
UNITS O F SELECTION 713
The fitness among simulations with r = .0025 does not rise higher because
recombination is constantly generating unbalanced types (“others” in Table 2).
This is a recombinution load. If the population were infinite in size, then a
perfectly balanced set could be selected when I‘ = 0.0 and no recombination load
would occur. The mean fitness of the population would then be %’ = (.5) (1.O)
+ (.5) (.9)36= .5225, the maximum that can be achieved by linkage. Thus the
appearance of an optimal recombination fraction is a result of the finiteness of
population size, while for an infinite population the tighter the linkage, the
higher the mean fitness at equilibrium. This is in agreement with the results of
5-locus models (LEWONTIN 1964a).
A result that is not obvious from the tables of gametic types, but which appears
in the full output of the computer calculations, is the very close adherence of
gene frequencies at each locus to the theoretical infinite population value of 0.5.
Except for the occasional fixed locus, and except for the case of r = 0 where
chromosomes once lost can never be replaced, the variance in gene frequency
among loci is distinctly smaller for tightly linked cases than for free recombina-
tion. For the two replicates shown in Table 2b, the variance of gene frequencies
(discounting fixed loci) is .000177 and .000140, respectively, as compared with
.00239 for a case with r = 0.5.
Large numbers of loci, each with fairly large selection coefficients, can then
be kept segregating with a much lower genetic load provided they are closely
enough linked. In assessing whether r = .0025 is a reasonable map distance, it
should be remembered that this is four times the average distance we estimated
between segregating loci in D.pseudoobscura.

Measures of linkage disequilibrium


Before going on to discuss more complete results, we need some useful descrip-
tions of the gametic array in a population. I n addition to the allelic frequencies
at each locus (which will nearly always be close to .50 in our models), many
linkage disequilibrium parameters are needed to completely specify the geno-
typic array for multilocus systems. With 36 loci there are 36 x 35/2 = 630
parameters necessary to specify disequilibrium between pairs of loci, 7140
parameters describing disequilibrium between triples, etc. Since linkage dis-
equilibrium is most clearly understood for pairs of loci-in fact there is no satis-
factory theory for three linked loci with selection, and since we expected most of
the deviations in frequency to be accounted for by first-order interactions, we
will only consider functions of D,defined as gllgoo- glogolwhere gll and gooare
the frequencies of the coupling gametes and gol and g I o the frequencies of the
repulsion gametes with respect to the two loci being considered (see LEWONTIN
and KOJIMA 1960). Even narrowing down to 2-locus effects, there are 630
separate D’s, and we need to consider some kind of average value. In the subse-
quent discussionswe will use two.
1) D*-the average of the absolute values of D over all 35 pairs of adjacent
loci. Because we are concerned primarily with symmetrical models, there is no
meaningful distinction between positive and negative values of D. The most
714 I. FRANKLIN A N D R. C. LEWONTIN

dramatic effects on disequilibrium occur for loci that are tightly linked, hence
the D s between adjacent loci will be most sensitive to changes in linkage dis-
equilibrium in the simulated populations.
2) 7-the average squared correlation coefficient between genes over all
n(n-1)
pairs of loci. The correlation between genes is defined as follows. A pair
2
of random variables ( X , Y ) is assigned to the pair of loci. The random variables
will each take the value 0 or 1 depending upon the allele at that locus in each
gamete. Thus the pairs (O,O), ( O , l ) , ( I @ ) ,and (1,I) correspond to the gametes
ab, d,A b , and AB, respectively, and have the frequencies goo,go,, g l o , and g l l
in the gamete pool. The correlation in question is that between X and Y in the
gamete pool. The correlation between a pair of loci is related to D simply as
p =D/ v p1p2(l-p1) ( l - p z ) (1)
where p , and p. are the allelic frequencies at the two loci. With gene frequencies
at each locus equal to 0.5, p2 = 16D2.This measure has a range between 0 and 1
since D cannot exceed .25. The reasons for using the average value of p2 will be
discussed later in this paper.
Because of the large number of pairwise combinations, it is impractical to make
2
a direct computation of D* or when large numbers of loci are involved. W e
have estimated 7 using a relation derived by SVED(1968).If the number of loci
heterozygous in an individual is H and the number of loci is n. then
+
Var ( H ) = n/4 8 Z D2ij
21
(2)
when gene frequencies are .5 at each locus.
Then
- 4Var ( H ) - n
p2 (3)
n (n-I)
In all the results, unless recombination is completely lacking (I = 0) , except for
an occasional locus that goes to complete fixation, all loci maintain gene fre-
quencies extremely close to .5 so [hat approximation ( 3 ) is very good.

Existence of multiple equilibria


Since the first simulations showed that small, but reasonable, values of recom-
bination in 36-locus systems would lead to very high linkage disequilibrium, a
more thorough investigation of these cases was undertaken. Population size was
increased somewhat to N e = 667 to reduce random fixation at separate loci.
Figure 2a shows the result of two replicat2 runs for r = .003 and W , = W , = 0.9.
In both cases there was a very slight rise in over the first 300 generations
-
followed by a marked rise starting at about generation 300 from w = 0.15 to
W = 0.41. Figure 2b shows that despite the very slight change in wfor the first
300 generations, the gametic structure has been undergoing a constant and con-
siderable change which is not reflected in mean fitness until later generations.
While replicate 1 showed a linear increase in D*,replicate 2 attained a plateau
between 160 and 240 generations with D' E .09 before its final sharp rise. Both
replicates, however, reach a final value of D* = 0.245 which is almost complete
0 IO0 200 300 400 500

0 IO0 200 300 400 3


GENERATION
FIGURE 2.-(a) Changes in (ordinate) over time (abscissa) for two replicates with r = .003
and W , = W,9= 0.9. (b) Changes in linkage association, D* (ordinate) over time for the same
p q d a t i o n s as in 2a.

linkage disequilibrium (7= .96). As in the earlier runs, two gametic types make
up 80% of the gametic pool, and unfixed gene frequencies are very close to .5
at all loci.
The presence of a plateau at D* = .09 in replicate 2 suggests the possibility of
more than one stable point for D* with a random event moving the system from
this lower stable point into the domain of attraction of the higher one. This is in
fact the case. Figure 3 shows the results of simulation with slightly lower linkage,
716 I. FRANKLIN A N D R. C. LEWONTIN

*25

* 20

*I5
'D
*IO

-05

0 100 200 300 400


GENERATION
FIGURE 3.-Four simulations with 36 loci, IV = 667, r = ,005, W , = W , = 0.9, but starting
from different initial amounts of linkage disequilibrium. Linkage disequilibrium D* (ordinate)
is shown over time (abscissa).

r = .005. The first run started in linkage equilibrium and after 300 generations
reached a plateau at D* = .09. Three other simulations with initial values of
D* = 0.25, 0.16, and 0.125, show that there are indeed two stable points, one at
D* = .24 and one at D* = .09 with the unstable point between them at approxi-
mately .14.
As a further check on the existence of more than one stable point, and in order
to discriminate the effect of finite population size in the Monte Carlo simulation,
we have examined the same selection model for five loci where exact results can
be computed. Figure 4 shows the result of computing trajectories of D* for five
loci with W , = W , = 0.9 at each locus and r = .002, .003, .004, and .005. The
2-locus exact theory predictions for these cases are that equilibrium D* = .I12
for r = .002 but D* = 0 for the lesser linkages, since r = .0025 is the critical
value of linkage from 2-locus theory. Figure 4 shows, as we have already demon-
strated, that D* is greater for five loci than for two loci with D* = .220 where
r = .002 and D* = .I87 for r = .003. These are both stable points since the
trajectories converge from above and below. I n addition, however, D* = 0 is also
a stable point for r = .003 since the trajectory with an initial value of D* = .05
shows a decrease in D* with time. For r = .003 there are two stable points,
D* r .I85 and D* = 0 with the unstable point between them at D* G .07. Fig-
ure 4 illustrates a general feature of multilocus symmetrical models. The range
of r is divided into three regions. For very small r there is a single stable point
UNITS O F SELECTION 71 7

25

_____ -------- -.
4-z2 0
/ -
-0

.I5- -............. - 45
. .. 0 *' ........
0'" //

.e--

.IO *-.-...-
.
e- - 40
...-----___..
...........--.._.
......-.-...2--.7-:.. __--------
*05 _-___--------___------ .......... ...............
-
05

IO0 200 300 400 500

FIGURE4.-Five-locus deterministic numerical solutions with W , = W , = 0.9, various


amounts of recombination between adjacent loci, and various initial amounts of linkage correla-
tion. Solid lines: T = ,003; broken line: r = .002; dotted line: r = .004.

with D*large. For large r there is a single point with D* = 0. For an intermediate
range, there are two stable points, one with D* large and one with D* = 0.
The existence of simultaneous stable equilibria with D = 0 and D # 0 has not
been found in 2-locus theory. Both LEWONTIN and KOJIMA(1960) and KARLIN
and FELDMAN (1969) found multiple stable equilibria for symmetric models,
and LEWONTIN ( 1964a) found multiple equilibria numerically in asymmetric
cases, but all such multiple equilibria had D # 0.

The effect of finite size


Returning to Figure 3, we see that the lower stable point is not at D* = 0,
unlike for the 5-locus calculation. This difference, however, is a result of finite
population size. The population cannot remain at D*= 0 because in any finite
population linkage disequilibrium is generated by sampling.
SVED(1968) has described a method for calculating the expected linkage
disequilibrium between two loci, provided that the gene frequencies are held
constant at an intermediate value. In particular, he showed that for gene fre-
quencies at each locus equal to 0.5, and with no epistasis,
1
E (D2) = (4)
16 ( 4 N r - k 1)
718 I. FRANKLIN A N D R. C. LEWONTIN

where N = effective population size


and r = recombination fraction between the two loci.
This formula, which is in close agreement with the result that OHTAand
KIMURA(1969) obtained by a different approach, has to be modified slightly in
our case to allow for the epistasis generated by the multiplicative model. Follow-
ing SVED,if we let z equal the sum of the frequencies of the repulsion gametes
( g l o4-g,,), the distribution of z at equilibrium is approximately

Where A (2) is the expected change in z in one generation


V (z) is the sampling variance of z
and C is a constant
NOW, A ( z ) = a(gio) f A(goi)
= (z - 1/) [Is2 z(1-z) - r]/w where s = l--W
=(z - 1/) [s2 z(1-x) - r ] (6)
and V(z) =z(l-z)/2N

c ezNszZ(l-Z)
[z (1-z) ] *NT-l (8)
Since D = 1/ (1/ - z) (assuming all gene frequencies = 0.5)
1/2
E(D*) = 2 J s (s- s ) @(z) dz (9)
0

We have evaluated the above expression numerically. For N = 667, and s = .l,
we have
E(D*) = .0962 for r = .003
and = .0677 for r = .005
The predicted linkage disequilibrium for r = .003 falls far short of the observed
value. At the recombination fraction r = .005, we have noted that there are
apparently at least two stable equilibria, and the lower one does not differ
greatly from that predicted by equation (9). The other equilibrium (D*N .24)
shows much stronger linkage disequilibrium, and cannot be explained by
genetic drift alone.
Clearly other factors need to be invoked to explain the strong linkage dis-
equilibrium observed in these simulation experiments, although genetic drift
could account for some of the initial rise in D*,especially at the looser recombi-
nation value.

The contribution of higher-order interactions to disequilibrium between pairs of


loci
One of the consequences of correlations in gene frequency is a change in the
selective differences among genotypes at the loci concerned. In particular, link-
age disequilibrium among a set of heterotic loci reinforces the heterosis at each
locus, and if loci interact with each other, there will be a corresponding increase
U N I T S O F SELECTION 719
TABLE 3
The genotypic array for two symmetrically overdominant loci showing
frequencies under random mating and fitnesses

in the interactions between all pairs of loci. The following example will make
this clearer.
Consider two symmetrically overdominant loci, with fimesses shown in
Table 3. The ratio of the fitness of an individual homozygous A A to the hetero-
zygote Aa is
W A A - g211Wl Wp
_ _- + + / + +
2gllg1oWI g2ioW1W , gllgoi W , gllgoo giogoi +giogoow,
WAa gZ11i- 2g11g10 + g'10 + +
g11g01 gllgoo glogol + g1ogoo
= (1 - S I )
+
(1 - se)pz.4 2s!2gllg10 - q A (10)
P - 4 q A - SS(gllg0l + glOg00) ' P A
Where SI = I-w, ; sz = I-wa
and P A = I - q A = frequency of the A gene.

Because of the symmetry in selective values, at equilibrium the gene frequency


at each locus will be 1/2, and g , , = goo= % f D; glo = gol = - D. Then

-(+7
_ _Aa_- (l-sl) g ( l - ~ p ) +2~,((1/16)
- -Dz)
W AA
% - 2s2((1/16) - D2)

= (l-s,) (11)
1 +(c
- ~

Since 0 < s < 1, and 0 < ID1 < %, expression (11) has a maximum value of
(1-s) whenD=O,andaminimum, (1-s)2,whenD= * i/.Forsmalls, (11)
is approximately
( 1-s1) ( 1 -1 6D2sS) (12)
It is obvious that the selection coefficients at each locus are a function not only
of the effects at each locus, but also of selective differences at all other loci which
are correlated in gene frequency. We can therefore distinguish two kinds of
contribution to the selective differences at a locus-the intrinsic and the extrinsic
selective values. The former is the effect on the fitness of an individual organism
of substituting, as by mutation, one genotype for another at a locus. For example,
in the above model the effect of substituting a homozygote at the first locus for
720 I. FRANKLIN A N D R. C. LEWONTIN

l L - X W P

I L - X W P
i

L L - x w o
UNITS O F SELECTION 721
a heterozygote is to reduce the fitness of the organism to ( l-sl) times its former
value. The intrinsic selective value of a genotype is not necessarily independent
of the background genotype. The extrinsic selective value is determined solely
by the background genotype, and is the combined contribution of all other loci.
The resultant, often referred to as the marginal fitness (LEWONTIN and KOJIMA
1960; BODMER and FELSENSTEIN 1967), or the apparent selective value (SVED
1968) is closely related to FISHER’S(1941) concept of the auerage excess of a
gene substitution, as opposed to the effect of substituting one allele for another
(the auerage effect of a gene substitution). There are important differences, how-
ever. FISHER’S concepts are defined for genes, not genotypes, and are a function
of the gene frequencies at the locus in question.
This phenomenon is very important when we consider more than two inter-
acting loci. Increasing the intensity of linkage disequilibrium in a block of loci
increases the selective differences at each locus, with a corresponding increase in
the interactions among the set of loci. We know from exact %locus theory that
epistasis can induce linkage disequilibrium, and there will be, under certain
circumstances, a positive feedback, i.e., linkage disequilibrium producing epi-
stasis which in turn causes stronger linkage disequilibrium. As we might expect,
this phenomenon is most marked when there are many segregating loci, for it is
only then that we have the potentiality for large contributions from other loci to
the selective values at a locus.
I n a finite population, where initial disequilibrium is generated by sampling,
we might expect a localized increase in linkage disequilibrium which then spreads
to all loci in a block, For example, a number of loci might by chance become
highly associated, with each component temporarily having a marginal fitness
approximately equal to the product of the fitness of all associated loci. These loci
will now interact strongly with other adjacent loci, and eventually many other
loci should “crystallize” into a large supergene. Such “crystallizations” appear
to occur in the simulation experiments. Figure 5 shows this process in one run.
The abscissa represents position along the chromosome from locus I to locus 36
and the ordinate shows the value of D between adjacent loci for each interval.
Each curve is a “map” of D in a successively later generation. As the figure
shows, nearly all values of D are low for the first 120 generations. I n generation
60 there is a rather high value at interval 7-8 and this develops into a crystalliza-
tion point. A second high point in generation 60 is at interval 15-16, but this
turns out not to be exactly at the crystallization nucleus which is interval 14-15.
Especially dramatic changes are seen to the left of interval 7-8 and to the right
of interval 14-15 where between generation 150 and 200 the correlation is
pulled from very low values to nearly unity by the presence of the “crystalliza-
tion nuclei.”

FIGURE 5.-Maps of linkage association along the 36-locus chromosome in successive genera-
tions of simulation with N = 667, r = ,003, w = 0.9. Ordinate shows D between adjacent loci;
abscissa the position on the chromosome. Dots: generation 60; crosses: generation 160; squares:
generation 170; large circles: generation 180; triangles: generation 190; small circles: generation
200.
722 I. FRANKLIN A N D R. C. LEWONTIN

The hypothesis outlined above accounts very well for the difference between
the simulations at different recombination values. Evidently the disequilibrium
generated by finite population size induces enough interaction between loci to
increase linkage disequilibrium deterministically for I* = .003, but not for r =
.005. In the latter case the genotypic array proceeds to a new high equilibrium
value only if the average value of D* is increased to approximately 0.14. This
could be achieved by reducing the population size. This hypothesis predicts that
there will exist, for an appropriate set of selection coefficients and recombination
values, two kinds of stable equilibria for three or more loci. One of these is the
array in which all loci are in linkage equilibrium (i.e., all D = 0) , and the other,
a set of equilibria with linkage disequilibrium parameters not equal to zero.
Numerical calculations with five loci support this prediction (see Figure 4).
Robustness of the system to changes in the model
1) Asymmetrical fitnesses
In the symmetrical model chosen above, the fitness of a genotype is a function
only of the number of homozygous loci, irrespective of whether they are 1/1 or
O/O. Any of the 23Gpossible gametic types. and its complement, may pre-
dominate and the stable gametic frequencies will be indistinguishable. To put it
another way, the stability of the system is invariant to interchange of the alleles
at any locus or set of loci. This is clearly an artifact of the symmetry of the
fitnesses assigned to each locus. To investigate the effect of asymmetry on the
system, we chose a rather extreme case, namely the set of fitnesses W , = 0.9375,
W , = 1, W , = 0.75, at each locus, which in the absence of interaction at other
loci predicts a stable equilibrium gene frequency of 0.8 for the favored allele in
each case. This set of fitnesses was chosen to give the same genetic load per locus
at equilibrium as the symmetrical case with W , = W , = 0.9. Simulation using
an effective population size of 667 and a recombination fraction of I = .003
showed that, as before, the initial array of gametic types in linkage equilibrium
is reduced to a few predominant gametic types in roughly equal frequency.
Table 4 shows the gametic composition after 700 generations of selection. The
array shown in this table is not yet at equilibrium, the third gametic type shown

TABLE 4
Gametic array near equilibrium from simulation with asymmetrical fitnesses
w,
= .9375, w,
= 1, w,
= .?5
Frequencies are 10 generation averages after 700 generations of selection
starting in linkage equilibrium.

Gamete Frequency
*
111 010 110 101 110 101 001 011 110 101 101 101 265
000 111 001 010 101 110 110 111 101 111 010 110 .I34
000 111 001 010 101 110 110 Ill 101 111 010 111 .IO8
111 101 111 101 111 011 111 100 111 010 111 011 .I11
Others .392
UNITS O F SELECTION 723
in the array having arisen from a negligible frequency only in the last 40 genera-
tions of the run. Reduction in the variety of gametic types at this point in the
run was so slow, however, that it was terminated for reasons of economy. Never-
theless, the result, even at this point, is obvious and not clearly different from the
symmetrical fitness runs.
All P5 distinguishable equilibria in the asymmetrical case with non-zero
linkage disequilibrium appear to be stable, at least insofar as we were able to
push m s close to equilibrium. On the average, one of these will be chosen, by
chance, with an approximately equal number of favored and unfavored alleles
on each gamete.
One effect of asymmetry was that there was a greater occurrence of fixation
at each locus, despite the fact that at equilibrium all the unfixed loci are near a
gene frequency of .5. This suggests that all the 235equilibria are not equally
stable in the sense that the returning force toward equilibrium from a random
perturbation is not equally strong. In order to better understand the multiple
equilibria in the asymmetric cases, we returned to a 5-locus model for which
exact numerical evaluation is possible. The model had the same selection and
linkage parameters as the 36-locus asymmetrical case. Because of the existence
of multiple stable states of unknown domains of stability, we used a numerical
method that traces the trajectory of gametic frequencies from some initial fre-
quency array and then finds the stable point in the domain of that trajectory
by a method similar to the technique of “hill-climbing.’’ As initial conditions
for each case we chose a different pair of complementary gametes such as 00000
and 11111 or 01101 and 10010, gave them each a frequency of .5 and searched
for the stable point. Because all loci are identical in action, there is complete
symmetry between loci 1 and 5, and between loci 2 and 3 . Thus, there are only
ten distinguishable complementary pairs and we looked for stable points corre-
sponding to all these. The complete gametic array of all 32 gametic types for all
ten cases is too extensive to display, but Table 5 attempts to summarize their
main features. The first column symbolizes the initial condition from which the
stable state was reached, i.e., 00000 means that the initial gametic frequencies
were .5 of that gametic type and .5 of its complement. The next two columns
show the four most common gametic types at equilibrium and their frequency.
The next four columns show the values of p2 between adjacent genes. The eighth
column gives the frequency of the allele I at locus 3 (it differs slightly over loci
depending upon distance from the end of the chromosome) and the last column
gives the mean fitness at equilibrium.
Table 5 shows that in 7 out of 10 cases, stable states with linkage disequilibrium
were found while in 3 cases only linkage equilibrium is stable. The cases have
been listed in order of decreasing W. This is also the order of decreasing deviation
of gene frequencies from the single-locus prediction of 0.2 : 0.8. As the first
columns show, the stable points with the highest values of w and greatest devia-
tion of gene frequencies toward .5, and therefore the most stable against random
fixation, are those corresponding to what might be called “coherent’’ chromo-
somes-those in which 0 and 1 alleles are in blocks rather than scattered and
724 1. FRANKLIN AND R. C. LEWONTIN

TABLE 5
Alternate stable states for a 5-locus asymmetrical model
W , = .9375, W , = 1, W , = .75, and r = .003.
Frequency of allele 1 at locus 3 is p ( 3 ) .

PZ
Most Frequency a t -
Initial pair common gametes equilibrium f,2 2,3 3,4 4,s P(3) W
OOOOO .6683 .7210 .7919 .7919 .7210 ,2481 .7876
11111 .1637
00001 .0221
10000 .0221

00000 .5104 ,6494 .7107 .6556 .0350 ,2345 .7813


00001 .1687
11110 ,1416
00010 ,0263

00000 .4630 .5965 .6369 .0482 .0417 .2283 .7791


00010 .1724
00001 ,0574
11100 .0509

00000 .461.9 ,5681 .0529 ,0529 ,5681 .2078 .7790


00100 ,1751.
11011 ,1105
(11000)
(OW1 1) 3 ,0386

00000 .4607 ,5203 .5225 ,0364 .2878 ,2192 .7776


11100 .1205
0001 1 ,1123
00010 .0644

00000 .4052 ,0207 .5196 ,5196 .0207 .2207 .7769


00001 .1235
10000 .1235
01110 .I133

00000 .3808 .2386 ,0161 .244Q .OOM .2063 .7749


00001 .IO34
00110 .0862
11000 .0782

1
00101’
00000 ,3277
00001 .0819
01001 > 00010 ,0819 0.0 0.0 0.0 0.0 .20000 .7738
00100 ,0819
01000 .0819
01010 10000 .0819
U N I T S O F SELECTION 725
intermixed. The chromosomes with dispersed I and 0 alleles are those with a
high “recombination index” according to FRASER (1967). He pointed out that
for normalizing selection, the higher the recombination index, as measured by
the number of switches from I to 0 and from 0 to I along the chromosome, the
smaller the loss in fitness because of “recombination load.” In our case, with
heterotic selection we observe the opposite effect, coherent chromosomes showing
the greatest fitness, probably because of the asymmetry of fitnesses.
We call special attention to the phenomenon of equilibrium gene frequencies
being closer to .5 when there is linkage. While the effect is very pronounced in
36-locus models (Table 4),the 5-locus models show it (Table 5 ) and so do all the
asymmetrical models discussed by LEWONTIN (1964a,b), as well as the sym-
metrical model of BODMER and PARSONS (1962). The reason for the effect is clear.
As linkage tightens, we are more and more dealing with a supergene with
“pseudoalleles.” While the equilibrium predicted for single alleles with homo-
zygous fitnesses .9375 and .75 is

’ s - .2500
= sft - .0625 .2500
the equilibrium predicted for “p~e~doalleles’~
+ = .20,

with homozygous fitnesses (.9375)36


= .09793 and (.75) 36 = .00003178is

o=-- s - .go207
= .473
sft .go207 4-.99997
In general:
$=-- 1 - W,”
2 - (W,” W,.)+
so limo = % if 0
n-f m
< W,,W, < 1
This effect means that tight linkage and strong linkage disequilibrium is a
preserver of genic variation and that relative selective ualues of single gene sub-
stitutions cannot be judged from allele frequencies. The frequencies of poly-
morphic alleles observed by LEWONTIN and HUBBY(1966) in their study of
enzyme polymorphism, thus, bear no necessary relationship to the relative selec-
tive values of the various genotypes when viewed as single locus effects. The
selection of the chromosome as a whole is the overriding determiner of allelic
frequencies.
2) Unequally spaced loci
We do not expect in natural populations that a sample of heterotic loci will
be equally spaced along a chromosome. Accordingly we investigated a model in
which the positions of the loci were determined by a sample from a uniform
distribution between 0 and .006 with the average distance between loci being
0.003. There was very little difference in the gametic array compared to the runs
in which loci were equally spaced, and, if anything, the average value of D*was
a little greater.
726 I. FRANKLIN A N D R. C . LEWONTIN

3) Other selection models


The model of individual gene effects on fitness multiplying each other is to
some extent unrealistic and has been strongly criticized as naive by MAYR( 1963),
KING(1967),MILKMAN (1967), and by SVED,REEDand BODMFLR (1967). They
point out that a more realistic model of selection is one in which some proportion
of the population survives, irrespective of the exact genotypic composition, re-
flecting the severity of the environment. Thus w
does not change as the popula-
tion evolves, since w i s simply the proportion of the population surviving. Rather,
the relative fitnesses of the various genotypes change.
In particular, KING suggested a specific model in which the value of some
phenotypic character is increased linearly with increasing heterozygosity but
with incomplete heritability. The result is that if many loci are involved, pheno-
type will be normally distributed. Selection then occurs by truncation, saving the
proportion K of the population with the highest phenotypic score. This model
has many interesting aspects and is worthy of a completely separate paper. For
our present purposes we show the results of a single case. Setting the mean fitness
at .20 (80% of the population culled) and recombination between adjacent loci
at .003 provides a model close to the multiplicative cases we have dealt with. In
addition heritability in the broad sense was specified as 10%. Table 6 shows the
resulting gametic types and their frequencies in several runs. Comparison with
Table 2 shows that there is essentially no difference.
Thus, once again our results are robust to a change in the model. These observa-
tions are in accord with the results reported by WILLS,CRENSHAW and VITALE
(1970) who simulated a multilocus model using a truncation selection scheme.
They too found the build-up of pronounced linkages with this selection scheme.
Passing to the limit
We have seen in our models that the effects of linkage on the genetic composi-
tion of the population increase as the number of loci considered increases. Thirty-
six-locus models show much greater correlation between loci than do 5-locus
models for the same selection intensities, and both show effects not predictable
from exact 2-locus theory. Clearly, higher-order interactions among loci are not

TABLE 6
Equilibrium gametic arrays from proportional selection model of KING,
with w
= .20 and he = .IO, r = .003
Run 1, N = low, started in linkage equilibrium Frequency

OOO 101 110 001 001 010 101 111 110 011 001 000 ,406
I l l 010 001 110 110 001 000 000 001 loo 110 111 .405
Others .I89
Run 2, N = BOO, started from complete coupling
I l l I l l 111 111 111 111 111 111 111 111 111 111 .MI
000 OOO 000 000 000 000 000 000 000 000 000 000 .380
Others .219
UNITS O F SELECTION 727
negligible. We cannot, however, continue to increase the number of loci indefi-
nitely while holding the effect of a gene substitution constant. Any realistic
view of the genome must suppose that as the number of segregating loci affecting
fitness increases, the effect of a single gene substitution must decrease.
Suppose we have a chromosome of some fixed map length 1. Moreover, suppose
that the finess of chromosomal homozygotes for this element is K as compared
to unity for a chromosomal heterozygote. (We are again assuming a symmetrical
model so that all chromosomal homozygotes have the same fitness). What will
happen as we pack more and more segregating genes into this fixed map length
with a fixed amount of inbreeding depression? Two compensatory effects will
occur. As more genes are packed into the linkage group, the recombination dis-
tance between adjacent loci will decrease linearly with gene number, but the
effect of a gene substitution on fitness will grow smaller. In a multiplicativemodel,
the fitness, I-s, of a homozygote at any one locus will be
1 - S = K ~ / ~or
In K
ln(1-s) =-
n
where n is the number of loci. For large n, 1-s will be very close to 1 so that the
effects of a single gene substitution will be

sz--
In K
n
That is, it will decrease linearly with n. The degree of epistasis as measured by
the deviation, E , from additivity is proportional to s2, however, so epistasis de-
creases as n2.Now 2-locus theory suggests that the effect of linkage and selection
as measured by correlation between loci depends upon the ratio E/.. If this ratio
is large either because epistasis is great or because recombination is rare, there
will be a large correlation between loci. If it falls below a critical value, there
will be no correlation at equilibrium. For the present case, then, E / . - n/n2 -
1Jn. Thus 2-locus theory predicts that as we increase the number of loci packed
into a fixed chromosome length with a fixed inbreeding depression, the correlation
between loci should decrease. This does not take into account, however, the
higher-order interactions. Are these sufficient to counterbalance the decrease in
effect, or will the importance of linkage grow vanishingly small?
To answer these questions we have carried out a large set of numerical calcula-
tions and simulations with two different values of K (.0225 and .4832),a range
of total map length up to 60 centimorgans and n = 2, 5,18,36, and 360.
The total map length is measured by
I= (n+l)r
This expression ignores the nonlinear relationship between map length and
recombination values. There will, however, only be a significant discrepancy
when we are dealing with large chromosome segments, and we are concerned
in this paper primarily with small recombination fractions between loci. We
728 I. FRANKLIN A N D R. C. LEWONTIN

have chosen 1 = ( n f 1 ) r because the average recombination between a pair of


adjacent randomly distributed loci on a chromosome of length 1 is r = Z / ( n + l ) .
-
The average squared correlation at equilibrium among all pairs of loci, p2, is
used as the measure of the effect of linkage. For two loci p2 is analytically a
linear function of 1. From LEWONTIN and KOJIMA(1960) for this symmetrical
case
D = ? V 1 - 4r
4 (1-W)' 4
so

For five loci, exact numerical computation was employed and p w a s directly
calculated over all pairs of loci. For 18, 36. and 360 loci, simulation was used
and F w a s estimated from the variance relation (3) given earlier in this paper.
The results of these calculations are shown in Figure 6a and 6b. In 6a, K =
0.0225 while in 6b, K = 0.4832. Table 7 gives the fitness per single locus homo-
zygous for these two cases with different numbers of loci from relation (13).
Attention is called to the extremely small fitness effects per single locus for large
numbers of loci.
Figures 6a and 6b show a remarkable phenomenon. As predicted by 2-locus
theory, the line relating p? to 1 has a negative slope and the 5-locus line lies
entirely beneath the 2-locus line. That is. increasing the number of loci in the
interval and sharing out the fitnejs among them does lessen the effect of the
linkage as measured by F f o r any given map length. When we pass from 5 to
18 loci, there is again a decrease in both cases but a much smaller one than when
passing from 2 to 5 loci. When the loci are now doubled to 36, there is only a
barely perceptible change in the line in the case of strong selection and a small
change for weak selection. Finally, for 360 loci there is no change at all for the
stronger selection, while for the weak selection the simulation gives such high
variance between generations that it was not possible to identify an equilibrium
condition. Thus we have a property of invariance appearing as the number of

TABLE 7
Fitness of a homozygote at a single locus among n loci when
the fitness of the n-tuple homozygote = K
. I

K
R ,0225 .4832

2 .I501 .6951
5 ,4683 3646
18 3100 ,9604
36 ,9000 .9800
360 .9895 ,9980
UNITS O F SELECTION 729

FIGURE 6.--The relation between the average correlation among all pairs of loci, on the
ordinate and total map length on the abscissa, for different numbers of loci making up the total
map. (a) Strong selection: K = 0.0225; (b) weak selection: K = 0.4832. Solid line: 2 loci; broken
line: 5 loci; dashed line: 18 loci (solid circles), 36 loci (crosses), 360 loci (open circles).

loci increases. For more than a couple of dozen loci, the average correldon be-
tween genes on the chromosome is independent of the number of genes or their
individual eflects and depends only on the total map length of the chromosomes
and the total inbreeding depression. Apparently, higher-order effects do come
into play in a significant way with many genes so as to just cancel out the weaken-
ing of first-order effects. Once again 2-locus theory is seen to be inadequate in an
important way for prediction of multilocus phenomena.
While we have shown that Fdepends only on two parameters, 1 and K , it does
730 I. FRANKLIN A N D R. C. LEWONTIN

not depend simply on the single parameter, selection per unit map length. For
-
example, from Figure 6 we see that for 36 loci in the case of strong selection,
p2 = .8 when homozygous load per centimorgan is .9775/9 = .109. The same
homozygous load per centimorgan
- for weak selection occurs for a map length of
4.7 centimorgans where p2 = 0 for 36 loci.

GENERAL IMPLICATIONS O F THE RESULTS

Two-locus theory has shown that linkage disequilibrium will be generated


among loci interacting multiplicatively provided that the loci involved are suffi-
ciently tightly linked. We have shown in this paper, and this is not apparent
from 2-locus theory, that the degree of linkage disequilibrium between a pair of
loci is not simply a function of the fitnesses of the 2-locus system (i.e., the average
effectsof gene substitution at these loci), but may be overwhelmingly determined
by the average effects of other loci forming a linked complex with the loci in
question.
It appears likely, from the data presented above, that the disequilibrium be-
tween pairs of loci is a function primarily of the map length of a chromosome
segment and the loss in fitness accompanying homozygosity of this segment.
Furthermore, this appears to hold under a wide degree of conditions, and in
particular the average correlation in gene frequency between a pair of loci on
a chromosome segment is largely independent of the number of loci, and hence
the average effect of a locus, in that segment. In general the average degree of
disequilibrium between a pair of loci on a chromosome segment will be somewhat
less than that expected between two loci sharing the load of the segment, of
the segment length apart, and greater than that expected if the fitness differences
are spread uniformly along the map. The former value (i.e., the upper bound)
is easily calculated from 2-locus theory, but as yet we have been unable to derive
the values for the limiting case when the chromosome becomes a quasi-continuum.
The model discussed above provides a possible explanation for the origin and
persistence of inversions in natural populations. If we look at the gametic array
formed in one of the simulated populations, we find considerable organization,
and it should be apparent that an inversion encompassing the block of loci would
be at a selective advantage. If this were the case, it would not be necessary that
the selection coefficients which initially induced the disequilibrium within the
set of heterotic loci be maintained-it would only be necessary that some degree
of heterosis persist. If inversions do arise this way, it should be noted that CO-
adaption of the alleles within an inversion is not necessary. Heterosis is the only
prerequisite. Co-adaptation may, of course, arise later (see KOJIMA1967).
The discovery that, when more than a couple of dozen genes are involved,
the gene number is irrelevant has far-reaching implications for the theory of
population genetics. While we commonly think of population genetics as the best
example of the successful application of mathematical theory in biology, much of
our confidence is unjustified. There is a striking discrepancy between the structure
of genetic theory and the observations of experiment and natural history. Popula-
UNITS O F SELECTION 73 1
tion genetic theory is framed in terms of the frequencies of alleles arid the effect
on fitness of gene substitution at those loci. Observations, with the rare exception
of a few single-locus polymorphisms, deal not with the frequencies of alleles and
their fitness effects but with phenotypes dependent on the whole genome and
with differences in fitness associated with genetic differences of large pieces of
the genome. The recent work on enzyme polymorphisms has provided informa-
tion on frequencies of alleles at single loci. But it has told nothing about the effect
on fitness of single-locus substitutions. Indeed, the main hiatus between theory
and experiment is that theory talks about single-locus fitness, while no one has
ever invented a method for measuring it. Every population geneticist who has
ever worked with experimental animals and plants knows that with the exception
of a few major single-locus polymorphisms, the attempt to measure the differ-
ence in fitness between homozygote and heterozygote at a single locus is frustrat-
ing. First, most single-locus fitness effects must be very small and, second, it is
impossible to know whether the results of two crosses differ only at the locus in
question or in a whole block of genes surrounding the locus. The usual approach
of backcrossing to a common parent for eight or a dozen generations (more often
three or four generations) is self-delusion. Such backcrossing procedures certainly
do not free a gene of the surrounding genetic material with which it is associated.
Genes five units on either side of a marker will retain 50% of their correlation
with the marker for 14 generations of backcrossing even without selection. Thus,
the attempt to measure single-locus fitness effects for most loci is doomed to
failure until methods of analysis several orders of magnitude more sensitive than
we now have are created.
What the geneticist can and does measure, however, are the fitness effects that
occur when whole blocks of genes, usually whole chromosomes, are made homo-
zygous and heterozygous. A theory of population genetics that is framed in terms
of chromosome segments, their map length, and their fitness effects per unit
map length, would then be a theory that would finally link up with the facts of
observation, much as biometrical genetics was an attempt to build a theory in
terms of the observable statistics of phenotype. We need a new formalism of
population genetic theory in which the observables from experiment are the
entities of theory. Our results in this paper give us some hope that such a formula-
tion is possible.
But the problem is more than an epistemological one. Suppose one could succeed
in completely randomizing the surrounding genetic material of a given locus.
Suppose further that one could measure the very small fitness effects at that
locus on the now randomized genetic background. Of what use would that be?
As we show, and as other work on linkage has shown, genes do not exist in
populations on random backgrounds. On the contrary, correlated blocks are prob-
ably the normal states of genes and natural selection operates on the correlated,
not the uncorrelated, state. A theory of the correlated genome is then required.
It may be that such a theory is of necessity so complex that it will not allow of
simple analytic predictions and that the kind of numerical analysis we have per-
formed is all that we can hope for.
732 I. FRANKLIN A N D R. C. LEWONTIN

In the data presented in this paper the strong association between loci is a
result of the predominance in the gametic array of only two gametic types, and
this prediction does not appear to be compatible with the observed chromosome
variation in natural populations. There are several possible explanations for this
which are compatible with the above theory. One is that the whole chromosome
does not form a single super locus, but condenses into a series of complexes which
segregate more or less independently. Another factor, not introduced in the
simulations, is the possibility of multiple allelic overdominant systems, and these
offer the potential for a much greater array of chromosome types.
The final question we must concern ourselves with is a difficult one. Are
selection coefficients of sufficiently great magnitude to produce this phenomenon
in natural populations? It is difficult not only because we have little information
about an essential parameter of the system, namely the effect on fitness of
homozygosity for a segment of chromosome, but also we do not know if the multi-
plicative model is a realistic one for the interaction of loci affecting fitness. More-
over, we do not even know whether the most basic feature of all the models dis-
cussed, single-locus overdominance, is the rule in nature. All our models have
assumed single-locus overdominance in order that the gene frequencies will go
to some intermediate equilibrium. There is nothing about work on linkage either
in this paper or any previous paper that suggests that linkage alone can stabilize
gene frequencies in the absence of overdominance. In fact the reverse is true. It
has been shown, on the other hand, by LEWONTIN(1964b) that even in the
absence of overdominance the fixation of gene frequencies by selection may be
tremendously retarded by linkage. While overdominance at each locus has been
inserted in the present models simply in order to maintain genetic variance
indefinitely, we do not know at the present time how important it is for the effects
we have observed. For example, in the absence of overdominance but with ex-
tremely small selection coefficients favoring one homozygote over the other, the
change in gene frequencies might be extremely slow and all during the course of
transient polymorphism the striking linkage effects we have observed may occur.
It remains for further investigation to show how essential the assumption of over-
dominance is.
Let us consider as an example a chromosome arm in Drosophila with a map
length of 0.4 morgans. Because there is no crossing over in the male, this cor-
responds to 1 = 0.2 in the graphs. How much selection would be necessary to
ensure a high degree of linkage disequilibrium among all the loci on the arm, say,
with 7" 0.5? We can easily calculate the upper bound for K , using the formula
for two loci

K < [l-d 3(1-p')


41 I'
i 0.073
UNITS O F SELECTION 733
We therefore require a depression in fitness of 93 % accompanying homozygosity
of this chromosome arm, and this seems an excessively high value. Viability
estimates made in Drosophila indicate much smaller inbreeding depressions than
this. Even though such studies ignore important components of fitness such as
sterility due to developmental and behavioral traits, depressions in fitness as large
as 93% seem unlikely.
On the other hand, these considerations, which are based on multiplicative
interaction of loci, may be very misleading (SVED,REEDand BODMER 1967; KING
1967) and alternate models proposed to sidestep many of the difficulties raised
by genetic load arguments introduce much stronger epistasis than discussed in
this paper. We must emphasize that as the number of loci grows large the devia-
tion of a multiplicative model from a purely additive one becomes vanishingly
small per locus pair. The epistatic deviation from additivity is equal to S~ where
s is the selection coefficient at each locus so that if the effect of homozygosity at
a locus is to reduce the fitness by 1%, the epistatic deviation is only 0.01% per
pair of loci. The expected degree of linkage disequilibrium depends on the model
of gene interaction, and it appears reasonable to suppose that other models will
predict much higher degrees of association. As mentioned earlier, we have in-
vestigated the truncation model discussed by KING,and indeed very strong link-
age disequilibrium has been found for quite realistic models. Also we should
note that the generation of linkage disequilibrium appears to be one of the rare
cases in population genetics where selection and drift complement each other-
both tend to increase association between loci.
Because this study emphasizes the possibility that almost neutral loci show
strong linkage disequilibrium over considerable linkage distances, there is a
greater need for experimental estimation of linkage disequilibrium in natural
populations.

SUMMARY

This study examines systems of 2, 5, 18, 36, and 360 segregating loci affecting
fitness. It chiefly deals with a model in which heterosis is multiplicative between
loci, but also deals with proportional selection, asymmetrical fitness selection
between alleles, and genes unequally spaced on the chromosome. The results
turn out to be robust to these variations in the model. The findings are that
(1) Multiple stable equilibria exist with different degrees of correlation between
loci.-(2) Even when effects of single allelic substitution on fitness are so small
that exact 2-locus theory predicts no effect of linkage, there is a strong correlation
between genes when many loci are segregating.-(3) When fitnesses are strongly
asymmetric at each locus, the equilibrium of gametic frequencies is such as to
make allelic frequencies nearly .5. Thus, allele frequencies at individual loci do
not reflect fitness effects at separate loci, but are a result of selection of the
chromosome as a whole.- (4) Higher-order interaction among loci in multilocus
systems increase in importance as the number of loci increases-(5) For a fixed
map length and a fixed amount of inbreeding depression per unit map length, the
equilibrium correlation among genes along the chromosome is independent of
734 I. FRANKLIN A N D R. C. LEWONTIN

the number of genes or their individual effects.-This last point makes it possible
to frame a theory of population genetics which does not contain individual loci
explicitly, but deals only with whole chromosomes, their recombination proper-
ties, and the effect of homozygosity of segments of various length. Such a theory
is more consonant with the observations possible in population genetics than a
theory framed in terms of gene frequencies.

LITERATURE CITED

BODMER, W. P. and J. FELSENSTEIN, 1967 Linkage and selection: Theoretical analysis of the
deterministic two locus random mating model. Genetics 57: 237-265.
BODMER, W. F. and P. PARSONS, 1962 Linkage and recombination in evolution. Advan. Genet.
11: 1-100.
FISHER, R. A., 1918 The correlation between relatives on the supposition of Mendelian inheri-
tance. Trans. Roy. Soc. Edinburgh 52: 399433. --, 1941 Average excess and average
effect of a gene substitution. Ann. Eugenics 11: 53-63.
FRASER, A., 1967 Gametic disequilibrium in multi-genic systems under normalizing selection.
Genetics 55: 507-512.
HILL,W. B. and A. ROBERTSON, 1968 Linkage disequilibrium in finite populations. Theoret.
Appl. Genet. 38: 226231.
KARLIN,S. and M. FELDMAN,1969 Linkage and selection: New equilibrium properties of the
two locus symmetric viability model. Proc. Natl. Acad. Sci. US. 62: 7@74.
KIMURA,M., 1965 Attainment of quasi-linkage equilibrium when gene frequencies are chang-
ing by natural selection. Genetics 52: 875-890.
KING, J. L., 1967 Continuously distributed factors affecting fitness. Genetics 55: 483A92.
KOJIMA,K., 1967 Likelihood of establishing newly induced inversion chromosomes in small
populations. Ciencia e Cultura 19: 67-77.
LEWONTIN,R. C., 1964a The interaction of selection and linkage. I. General considerations;
heterotic models. Genetics 49: 49-67. -, 1964b The interaction of selection and
linkage. 11. Optimum models. Genetics 50: 757-782.
LEWONTIN,R. C. and J. L. HUBBY,1966 A molecular approach to the study of genic hetero-
zygosity in natural populations. 11. Amount of variation and degree of heterozygosity in
natural populations of Drosophila pseudoobscura. Genetics 54 :595-609.
LEWONTIN,R. C. and K. KOJIMA,1950 The evolutionary dynamics of complex polymorphisms.
Evolution 14: 458472.
MAYR,E., 1963 Animal Species and Evolution. Harvard Univ. Press. Cambridge, Mass.
MILKMAN,R. D., 1967 Heterosis as a major cause of heterozygosity in nature. Genetics 55:
493-495.
OHTA,T. and M. KIMURA,1969 Linkage disequilibrium due to random genetic drift. Genet.
Res. 13: 47-55.
PRAKASH, S., R. C. LEWONTINand J. L. HUBBY, 1969 A molecular approach to the study of
genic heterozygosity in natural populations. IV. Patterns of genic variation in central
marginal, and isolated populations of Drosophila pseudoobscura. Genetics 61 : 841-858.
SVED,J. A., 1968 The stability of linked systems of loci with small population size. Genetics
59: 543-563.
SVED,J. A., E. REEDand W. J. BODMER, 1967 The numbers of balanced polymorphisms that can
be maintained in a natural population. Genetics 55: 469481.
WILLS,C., J. CRENSHAW and J. VITALE,1970 A computer model allowing maintenance of large
amounts of genetic variability in Mendelian populations. I. Assumptions and results for
large populations. Genetics 64: 107-123.
WRIGHT,S., 1967 Surfaces of selective value. Proc. Natl. Acad. Sci. U.S. 58: 165-172.

You might also like