Professional Documents
Culture Documents
I Franklin & RC Lewontin Is The Gene The Unit of Selection
I Franklin & RC Lewontin Is The Gene The Unit of Selection
lThe research described here was supported by the Atomic Energy Commission Contract AT(ll-1)-1437.
Present address: Division of Animal Genetics, Commonwealth Scientific and Industrial Research Organization.
P.O. Box 90, Epping, New South Wales, 2121 Australia.
.ow .367 0 CO
* From LEWONTIN
1964a, Table 9
some containing other loci interacting with them, than when they are considered
in isolation. Such a result does not follow from simple considerations of correla-
tion and arises from higher-order interactions that do not exist in the two-locus
case.
Some of the results presented in this paper can be anticipated from the results
of 5-locus calculations made by LEWONTIN (1964a). Table 1 shows the equilib-
rium linkage relations for different map distances, expressed as the correlation
between loci,* for a pair of adjacent loci (locus 2 and locus 3) in a 5-locus multi-
plicative fitness model (Table 9, LEWONTIN 1964a). The model has a symmetry
such that the correlation for two loci can be calculated from exact 2-bcus theory.
The last column gives the ratio of observed to expected. The table shows that
as the map distance between the loci is increased, the effect of embedding
adjacent loci in a multilocus chromosome grows greater and greater. The effect
becomes relatively most extreme for map distances above .0625when exact 2-locus
theory predicts no correlation. We see two effects then. The correlation between
loci is greater than expected from 2-locus theory and the critical map distance,
above which there is no effect of linkage, is increased, although only slightly.
It is the purpose of this paper to push these two observations much farther
by increasing greatly the number of loci and examining a variety of selection and
linkage models in an approach to a realistic model of the genome. The results
turn out to be rather surprising.
METHODS
tempt was to derive essentially deterministic results by minimizing the stochastic effects. Finite-
ness of population size does turn out to be of some interest, however.
Monte Carlo simulation experiments were carried out using a general purpose program
written for the IBM 7094 at the University of Chicago. This program consists of a main deck
written i n FORTRAN which is primarily responsible for input and output, and a set of sub-
routines written in assembly language concerned with mating, recombination, evaluation of
phenotype and selection. Each of these subroutines has a set of optional decks which allow
different mating schemes and patterns of selection. The options used to generate the results
given in this paper were chosen so that the simulated populations had the following properties:
1) Population size
Males chosen in sequence and females chosen at random from a n array in store are used to
generate progeny, which are then selected on the basis of their fitness. Progeny are generated
until a population size N (equal to the number of parents) is attained. Because males are chosen
i n sequence, they have virtually no variance in offspring number due to Poisson sampling, while
the females have such variance so that effective population size is nearly (4N) /3.
2) Recombination
Crossovers are generated with no interferencethe number of crossovers in a segment with
map length A morgans is calculated by sampling from a Poisson distribution with mean A, and
the position of each crossover is determined by sampling from a uniform distribution [0, A].
3) Selection
a) Multiplicative fitnesses
The fitness of a genotype is the product of the fitnesses at the individual loci; each locus is
assumed to have an identical effect on fitness. We assign to each heterozygote a fitness W,, and
to homozygotes O / O and 2 / 1 values W , and W',. Then an individual with n, loci homozygous
O / O , n2 loci heterozygous, and n, loci homozygous i/i will have a fitness W,nl Wen,W,n,. This
value is compared to a uniform random variable between 0 and W,n. If the genotypic value is
greater than the random variable, the individual is saved, if less, discarded.
b) Proportional selection
Following a model of KING (1967), which postulates a fixed proportion of the generated
individuals surviving, a score is constructed by adding the number of heterozygous loci in an
individual to a random normal deviate. The individual is saved if the phenotypic score is greater
than a truncation point computed each generation so as to save a proportion R of the population.
The random number is chosen from a normal distribution with mean 0, and variance equal to
C times the variance of the number of heterozygous loci in the initial population. The initial
+
heritability is therefore 1/(C 1).
Because the word size of the IBM 7094 is 36, it is convenient when dealing with a large
number of loci to consider a multiple of 36.
Pseudorandom sequences are generated by the power residue method using a base of 3l9. We
have tested the sequences for frequency of single integers, pairs, triplets, and quadruplets. No
autocorrelations have been detected.
For 5-locus results, direct numerical calculation of equilibria was made using the method of
LEWONTIN (1964a).
The original computer program, modified by us, a d the tests of random numbers were the
work of our former colleague PROFESSOR MADHO SINGHof the State College, Oneonta, N.Y.
RESULTS
TABLE 2
Equilibrium gametic arrays from initial simulations
Each line is a gametic type. 0 and I are alternate alleles. Asterisks mark fixed loci.
T h e 36 loci are spaced in 12 groups of 3 for ease of reading only.
~~
Gamete Frequency
(a) r = 0.0
* *
rep 1 101 OOO 011 001 100 000 111 011 101 110 101 101 .454
010 110 111 010 011 101 001 000 110 101 110 010 .391
100 111 110 101 011 011 100 100 010 100 011 111 .155
rep2 001 010 101 100 011 001 101 100 101 110 010 010 .280
110 011 000 111 001 110 011 010 110 110 100 000 .I72
011 100 110 001 100 111 111 110 000 011 101 101 .346
000 011 101 011 111 001 100 011 001 OOO 011 101 .024
100 011 011 101 110 101 OOO 010 111 OOO 011 111 .178
(b) r = .0025
rep 1 ** * *
(genera- 011 010 110 011 000 110 101 011 011 110 001 101 .411
tion300) 100 101 001 010 111 001 010 101 100 000 110 010 .424
others .165
rep 2 *
(genera- 001 100 100 011 101 011 111 100 100 111 000 111 .44Q
tion420) 110 011 011 100 010 100 001 011 011 000 111 OOO .427
others .133
except for the two loci starred in rep 1, all the loci are segregating when all
types are in the population. Since no two gametes make a balanced pair, no indi-
vidual in the population is a complete or nearly complete heterozygote. All
individuals, even those heterozygous for gametic types, are homozygous at many
loci (a minimum of 10 in rep 1) so the mean fitness of the population is not
raised much despite the tremendous restriction in the number of gametic types.
This phenomenon is a result of the finiteness of the population. Since there is
absolutely no recombination, each gametic type is an “allele” and random drift
causes the elimination of all but a few “alleles.” If the “alleles” do not include
a perfectly balanced pair, there is no way to recover such a pair.
When there is a small amount of recombination ( r = .0025), recombination
between the “alleles” can occur to produce perfectly balanced pairs (disregarding
loci completely fixed in the population). Table 2b shows such pairs making up
85% of the gametic array in each replicate. In rep 2, for example, 38% of the
population will be heterozygotes for the two main gametic types and therefore
heterozygous for 35 loci. w
will be high in such a population. Examination of the
.detailed output from the computation shows that the partial plateau for curve 5
shown in Figure 1 between generations 260 and 320 followed by the steep rise
to the new plateau of high fitness is the result of a partially balanced set of
chromosomes being replaced by a balanced set after a rare recombination gave
rise to such a set.
UNITS O F SELECTION 713
The fitness among simulations with r = .0025 does not rise higher because
recombination is constantly generating unbalanced types (“others” in Table 2).
This is a recombinution load. If the population were infinite in size, then a
perfectly balanced set could be selected when I‘ = 0.0 and no recombination load
would occur. The mean fitness of the population would then be %’ = (.5) (1.O)
+ (.5) (.9)36= .5225, the maximum that can be achieved by linkage. Thus the
appearance of an optimal recombination fraction is a result of the finiteness of
population size, while for an infinite population the tighter the linkage, the
higher the mean fitness at equilibrium. This is in agreement with the results of
5-locus models (LEWONTIN 1964a).
A result that is not obvious from the tables of gametic types, but which appears
in the full output of the computer calculations, is the very close adherence of
gene frequencies at each locus to the theoretical infinite population value of 0.5.
Except for the occasional fixed locus, and except for the case of r = 0 where
chromosomes once lost can never be replaced, the variance in gene frequency
among loci is distinctly smaller for tightly linked cases than for free recombina-
tion. For the two replicates shown in Table 2b, the variance of gene frequencies
(discounting fixed loci) is .000177 and .000140, respectively, as compared with
.00239 for a case with r = 0.5.
Large numbers of loci, each with fairly large selection coefficients, can then
be kept segregating with a much lower genetic load provided they are closely
enough linked. In assessing whether r = .0025 is a reasonable map distance, it
should be remembered that this is four times the average distance we estimated
between segregating loci in D.pseudoobscura.
dramatic effects on disequilibrium occur for loci that are tightly linked, hence
the D s between adjacent loci will be most sensitive to changes in linkage dis-
equilibrium in the simulated populations.
2) 7-the average squared correlation coefficient between genes over all
n(n-1)
pairs of loci. The correlation between genes is defined as follows. A pair
2
of random variables ( X , Y ) is assigned to the pair of loci. The random variables
will each take the value 0 or 1 depending upon the allele at that locus in each
gamete. Thus the pairs (O,O), ( O , l ) , ( I @ ) ,and (1,I) correspond to the gametes
ab, d,A b , and AB, respectively, and have the frequencies goo,go,, g l o , and g l l
in the gamete pool. The correlation in question is that between X and Y in the
gamete pool. The correlation between a pair of loci is related to D simply as
p =D/ v p1p2(l-p1) ( l - p z ) (1)
where p , and p. are the allelic frequencies at the two loci. With gene frequencies
at each locus equal to 0.5, p2 = 16D2.This measure has a range between 0 and 1
since D cannot exceed .25. The reasons for using the average value of p2 will be
discussed later in this paper.
Because of the large number of pairwise combinations, it is impractical to make
2
a direct computation of D* or when large numbers of loci are involved. W e
have estimated 7 using a relation derived by SVED(1968).If the number of loci
heterozygous in an individual is H and the number of loci is n. then
+
Var ( H ) = n/4 8 Z D2ij
21
(2)
when gene frequencies are .5 at each locus.
Then
- 4Var ( H ) - n
p2 (3)
n (n-I)
In all the results, unless recombination is completely lacking (I = 0) , except for
an occasional locus that goes to complete fixation, all loci maintain gene fre-
quencies extremely close to .5 so [hat approximation ( 3 ) is very good.
linkage disequilibrium (7= .96). As in the earlier runs, two gametic types make
up 80% of the gametic pool, and unfixed gene frequencies are very close to .5
at all loci.
The presence of a plateau at D* = .09 in replicate 2 suggests the possibility of
more than one stable point for D* with a random event moving the system from
this lower stable point into the domain of attraction of the higher one. This is in
fact the case. Figure 3 shows the results of simulation with slightly lower linkage,
716 I. FRANKLIN A N D R. C. LEWONTIN
*25
* 20
*I5
'D
*IO
-05
r = .005. The first run started in linkage equilibrium and after 300 generations
reached a plateau at D* = .09. Three other simulations with initial values of
D* = 0.25, 0.16, and 0.125, show that there are indeed two stable points, one at
D* = .24 and one at D* = .09 with the unstable point between them at approxi-
mately .14.
As a further check on the existence of more than one stable point, and in order
to discriminate the effect of finite population size in the Monte Carlo simulation,
we have examined the same selection model for five loci where exact results can
be computed. Figure 4 shows the result of computing trajectories of D* for five
loci with W , = W , = 0.9 at each locus and r = .002, .003, .004, and .005. The
2-locus exact theory predictions for these cases are that equilibrium D* = .I12
for r = .002 but D* = 0 for the lesser linkages, since r = .0025 is the critical
value of linkage from 2-locus theory. Figure 4 shows, as we have already demon-
strated, that D* is greater for five loci than for two loci with D* = .220 where
r = .002 and D* = .I87 for r = .003. These are both stable points since the
trajectories converge from above and below. I n addition, however, D* = 0 is also
a stable point for r = .003 since the trajectory with an initial value of D* = .05
shows a decrease in D* with time. For r = .003 there are two stable points,
D* r .I85 and D* = 0 with the unstable point between them at D* G .07. Fig-
ure 4 illustrates a general feature of multilocus symmetrical models. The range
of r is divided into three regions. For very small r there is a single stable point
UNITS O F SELECTION 71 7
25
_____ -------- -.
4-z2 0
/ -
-0
.I5- -............. - 45
. .. 0 *' ........
0'" //
.e--
.IO *-.-...-
.
e- - 40
...-----___..
...........--.._.
......-.-...2--.7-:.. __--------
*05 _-___--------___------ .......... ...............
-
05
with D*large. For large r there is a single point with D* = 0. For an intermediate
range, there are two stable points, one with D* large and one with D* = 0.
The existence of simultaneous stable equilibria with D = 0 and D # 0 has not
been found in 2-locus theory. Both LEWONTIN and KOJIMA(1960) and KARLIN
and FELDMAN (1969) found multiple stable equilibria for symmetric models,
and LEWONTIN ( 1964a) found multiple equilibria numerically in asymmetric
cases, but all such multiple equilibria had D # 0.
c ezNszZ(l-Z)
[z (1-z) ] *NT-l (8)
Since D = 1/ (1/ - z) (assuming all gene frequencies = 0.5)
1/2
E(D*) = 2 J s (s- s ) @(z) dz (9)
0
We have evaluated the above expression numerically. For N = 667, and s = .l,
we have
E(D*) = .0962 for r = .003
and = .0677 for r = .005
The predicted linkage disequilibrium for r = .003 falls far short of the observed
value. At the recombination fraction r = .005, we have noted that there are
apparently at least two stable equilibria, and the lower one does not differ
greatly from that predicted by equation (9). The other equilibrium (D*N .24)
shows much stronger linkage disequilibrium, and cannot be explained by
genetic drift alone.
Clearly other factors need to be invoked to explain the strong linkage dis-
equilibrium observed in these simulation experiments, although genetic drift
could account for some of the initial rise in D*,especially at the looser recombi-
nation value.
in the interactions between all pairs of loci. The following example will make
this clearer.
Consider two symmetrically overdominant loci, with fimesses shown in
Table 3. The ratio of the fitness of an individual homozygous A A to the hetero-
zygote Aa is
W A A - g211Wl Wp
_ _- + + / + +
2gllg1oWI g2ioW1W , gllgoi W , gllgoo giogoi +giogoow,
WAa gZ11i- 2g11g10 + g'10 + +
g11g01 gllgoo glogol + g1ogoo
= (1 - S I )
+
(1 - se)pz.4 2s!2gllg10 - q A (10)
P - 4 q A - SS(gllg0l + glOg00) ' P A
Where SI = I-w, ; sz = I-wa
and P A = I - q A = frequency of the A gene.
-(+7
_ _Aa_- (l-sl) g ( l - ~ p ) +2~,((1/16)
- -Dz)
W AA
% - 2s2((1/16) - D2)
= (l-s,) (11)
1 +(c
- ~
Since 0 < s < 1, and 0 < ID1 < %, expression (11) has a maximum value of
(1-s) whenD=O,andaminimum, (1-s)2,whenD= * i/.Forsmalls, (11)
is approximately
( 1-s1) ( 1 -1 6D2sS) (12)
It is obvious that the selection coefficients at each locus are a function not only
of the effects at each locus, but also of selective differences at all other loci which
are correlated in gene frequency. We can therefore distinguish two kinds of
contribution to the selective differences at a locus-the intrinsic and the extrinsic
selective values. The former is the effect on the fitness of an individual organism
of substituting, as by mutation, one genotype for another at a locus. For example,
in the above model the effect of substituting a homozygote at the first locus for
720 I. FRANKLIN A N D R. C. LEWONTIN
l L - X W P
I L - X W P
i
L L - x w o
UNITS O F SELECTION 721
a heterozygote is to reduce the fitness of the organism to ( l-sl) times its former
value. The intrinsic selective value of a genotype is not necessarily independent
of the background genotype. The extrinsic selective value is determined solely
by the background genotype, and is the combined contribution of all other loci.
The resultant, often referred to as the marginal fitness (LEWONTIN and KOJIMA
1960; BODMER and FELSENSTEIN 1967), or the apparent selective value (SVED
1968) is closely related to FISHER’S(1941) concept of the auerage excess of a
gene substitution, as opposed to the effect of substituting one allele for another
(the auerage effect of a gene substitution). There are important differences, how-
ever. FISHER’S concepts are defined for genes, not genotypes, and are a function
of the gene frequencies at the locus in question.
This phenomenon is very important when we consider more than two inter-
acting loci. Increasing the intensity of linkage disequilibrium in a block of loci
increases the selective differences at each locus, with a corresponding increase in
the interactions among the set of loci. We know from exact %locus theory that
epistasis can induce linkage disequilibrium, and there will be, under certain
circumstances, a positive feedback, i.e., linkage disequilibrium producing epi-
stasis which in turn causes stronger linkage disequilibrium. As we might expect,
this phenomenon is most marked when there are many segregating loci, for it is
only then that we have the potentiality for large contributions from other loci to
the selective values at a locus.
I n a finite population, where initial disequilibrium is generated by sampling,
we might expect a localized increase in linkage disequilibrium which then spreads
to all loci in a block, For example, a number of loci might by chance become
highly associated, with each component temporarily having a marginal fitness
approximately equal to the product of the fitness of all associated loci. These loci
will now interact strongly with other adjacent loci, and eventually many other
loci should “crystallize” into a large supergene. Such “crystallizations” appear
to occur in the simulation experiments. Figure 5 shows this process in one run.
The abscissa represents position along the chromosome from locus I to locus 36
and the ordinate shows the value of D between adjacent loci for each interval.
Each curve is a “map” of D in a successively later generation. As the figure
shows, nearly all values of D are low for the first 120 generations. I n generation
60 there is a rather high value at interval 7-8 and this develops into a crystalliza-
tion point. A second high point in generation 60 is at interval 15-16, but this
turns out not to be exactly at the crystallization nucleus which is interval 14-15.
Especially dramatic changes are seen to the left of interval 7-8 and to the right
of interval 14-15 where between generation 150 and 200 the correlation is
pulled from very low values to nearly unity by the presence of the “crystalliza-
tion nuclei.”
FIGURE 5.-Maps of linkage association along the 36-locus chromosome in successive genera-
tions of simulation with N = 667, r = ,003, w = 0.9. Ordinate shows D between adjacent loci;
abscissa the position on the chromosome. Dots: generation 60; crosses: generation 160; squares:
generation 170; large circles: generation 180; triangles: generation 190; small circles: generation
200.
722 I. FRANKLIN A N D R. C. LEWONTIN
The hypothesis outlined above accounts very well for the difference between
the simulations at different recombination values. Evidently the disequilibrium
generated by finite population size induces enough interaction between loci to
increase linkage disequilibrium deterministically for I* = .003, but not for r =
.005. In the latter case the genotypic array proceeds to a new high equilibrium
value only if the average value of D* is increased to approximately 0.14. This
could be achieved by reducing the population size. This hypothesis predicts that
there will exist, for an appropriate set of selection coefficients and recombination
values, two kinds of stable equilibria for three or more loci. One of these is the
array in which all loci are in linkage equilibrium (i.e., all D = 0) , and the other,
a set of equilibria with linkage disequilibrium parameters not equal to zero.
Numerical calculations with five loci support this prediction (see Figure 4).
Robustness of the system to changes in the model
1) Asymmetrical fitnesses
In the symmetrical model chosen above, the fitness of a genotype is a function
only of the number of homozygous loci, irrespective of whether they are 1/1 or
O/O. Any of the 23Gpossible gametic types. and its complement, may pre-
dominate and the stable gametic frequencies will be indistinguishable. To put it
another way, the stability of the system is invariant to interchange of the alleles
at any locus or set of loci. This is clearly an artifact of the symmetry of the
fitnesses assigned to each locus. To investigate the effect of asymmetry on the
system, we chose a rather extreme case, namely the set of fitnesses W , = 0.9375,
W , = 1, W , = 0.75, at each locus, which in the absence of interaction at other
loci predicts a stable equilibrium gene frequency of 0.8 for the favored allele in
each case. This set of fitnesses was chosen to give the same genetic load per locus
at equilibrium as the symmetrical case with W , = W , = 0.9. Simulation using
an effective population size of 667 and a recombination fraction of I = .003
showed that, as before, the initial array of gametic types in linkage equilibrium
is reduced to a few predominant gametic types in roughly equal frequency.
Table 4 shows the gametic composition after 700 generations of selection. The
array shown in this table is not yet at equilibrium, the third gametic type shown
TABLE 4
Gametic array near equilibrium from simulation with asymmetrical fitnesses
w,
= .9375, w,
= 1, w,
= .?5
Frequencies are 10 generation averages after 700 generations of selection
starting in linkage equilibrium.
Gamete Frequency
*
111 010 110 101 110 101 001 011 110 101 101 101 265
000 111 001 010 101 110 110 111 101 111 010 110 .I34
000 111 001 010 101 110 110 Ill 101 111 010 111 .IO8
111 101 111 101 111 011 111 100 111 010 111 011 .I11
Others .392
UNITS O F SELECTION 723
in the array having arisen from a negligible frequency only in the last 40 genera-
tions of the run. Reduction in the variety of gametic types at this point in the
run was so slow, however, that it was terminated for reasons of economy. Never-
theless, the result, even at this point, is obvious and not clearly different from the
symmetrical fitness runs.
All P5 distinguishable equilibria in the asymmetrical case with non-zero
linkage disequilibrium appear to be stable, at least insofar as we were able to
push m s close to equilibrium. On the average, one of these will be chosen, by
chance, with an approximately equal number of favored and unfavored alleles
on each gamete.
One effect of asymmetry was that there was a greater occurrence of fixation
at each locus, despite the fact that at equilibrium all the unfixed loci are near a
gene frequency of .5. This suggests that all the 235equilibria are not equally
stable in the sense that the returning force toward equilibrium from a random
perturbation is not equally strong. In order to better understand the multiple
equilibria in the asymmetric cases, we returned to a 5-locus model for which
exact numerical evaluation is possible. The model had the same selection and
linkage parameters as the 36-locus asymmetrical case. Because of the existence
of multiple stable states of unknown domains of stability, we used a numerical
method that traces the trajectory of gametic frequencies from some initial fre-
quency array and then finds the stable point in the domain of that trajectory
by a method similar to the technique of “hill-climbing.’’ As initial conditions
for each case we chose a different pair of complementary gametes such as 00000
and 11111 or 01101 and 10010, gave them each a frequency of .5 and searched
for the stable point. Because all loci are identical in action, there is complete
symmetry between loci 1 and 5, and between loci 2 and 3 . Thus, there are only
ten distinguishable complementary pairs and we looked for stable points corre-
sponding to all these. The complete gametic array of all 32 gametic types for all
ten cases is too extensive to display, but Table 5 attempts to summarize their
main features. The first column symbolizes the initial condition from which the
stable state was reached, i.e., 00000 means that the initial gametic frequencies
were .5 of that gametic type and .5 of its complement. The next two columns
show the four most common gametic types at equilibrium and their frequency.
The next four columns show the values of p2 between adjacent genes. The eighth
column gives the frequency of the allele I at locus 3 (it differs slightly over loci
depending upon distance from the end of the chromosome) and the last column
gives the mean fitness at equilibrium.
Table 5 shows that in 7 out of 10 cases, stable states with linkage disequilibrium
were found while in 3 cases only linkage equilibrium is stable. The cases have
been listed in order of decreasing W. This is also the order of decreasing deviation
of gene frequencies from the single-locus prediction of 0.2 : 0.8. As the first
columns show, the stable points with the highest values of w and greatest devia-
tion of gene frequencies toward .5, and therefore the most stable against random
fixation, are those corresponding to what might be called “coherent’’ chromo-
somes-those in which 0 and 1 alleles are in blocks rather than scattered and
724 1. FRANKLIN AND R. C. LEWONTIN
TABLE 5
Alternate stable states for a 5-locus asymmetrical model
W , = .9375, W , = 1, W , = .75, and r = .003.
Frequency of allele 1 at locus 3 is p ( 3 ) .
PZ
Most Frequency a t -
Initial pair common gametes equilibrium f,2 2,3 3,4 4,s P(3) W
OOOOO .6683 .7210 .7919 .7919 .7210 ,2481 .7876
11111 .1637
00001 .0221
10000 .0221
1
00101’
00000 ,3277
00001 .0819
01001 > 00010 ,0819 0.0 0.0 0.0 0.0 .20000 .7738
00100 ,0819
01000 .0819
01010 10000 .0819
U N I T S O F SELECTION 725
intermixed. The chromosomes with dispersed I and 0 alleles are those with a
high “recombination index” according to FRASER (1967). He pointed out that
for normalizing selection, the higher the recombination index, as measured by
the number of switches from I to 0 and from 0 to I along the chromosome, the
smaller the loss in fitness because of “recombination load.” In our case, with
heterotic selection we observe the opposite effect, coherent chromosomes showing
the greatest fitness, probably because of the asymmetry of fitnesses.
We call special attention to the phenomenon of equilibrium gene frequencies
being closer to .5 when there is linkage. While the effect is very pronounced in
36-locus models (Table 4),the 5-locus models show it (Table 5 ) and so do all the
asymmetrical models discussed by LEWONTIN (1964a,b), as well as the sym-
metrical model of BODMER and PARSONS (1962). The reason for the effect is clear.
As linkage tightens, we are more and more dealing with a supergene with
“pseudoalleles.” While the equilibrium predicted for single alleles with homo-
zygous fitnesses .9375 and .75 is
’ s - .2500
= sft - .0625 .2500
the equilibrium predicted for “p~e~doalleles’~
+ = .20,
o=-- s - .go207
= .473
sft .go207 4-.99997
In general:
$=-- 1 - W,”
2 - (W,” W,.)+
so limo = % if 0
n-f m
< W,,W, < 1
This effect means that tight linkage and strong linkage disequilibrium is a
preserver of genic variation and that relative selective ualues of single gene sub-
stitutions cannot be judged from allele frequencies. The frequencies of poly-
morphic alleles observed by LEWONTIN and HUBBY(1966) in their study of
enzyme polymorphism, thus, bear no necessary relationship to the relative selec-
tive values of the various genotypes when viewed as single locus effects. The
selection of the chromosome as a whole is the overriding determiner of allelic
frequencies.
2) Unequally spaced loci
We do not expect in natural populations that a sample of heterotic loci will
be equally spaced along a chromosome. Accordingly we investigated a model in
which the positions of the loci were determined by a sample from a uniform
distribution between 0 and .006 with the average distance between loci being
0.003. There was very little difference in the gametic array compared to the runs
in which loci were equally spaced, and, if anything, the average value of D*was
a little greater.
726 I. FRANKLIN A N D R. C . LEWONTIN
TABLE 6
Equilibrium gametic arrays from proportional selection model of KING,
with w
= .20 and he = .IO, r = .003
Run 1, N = low, started in linkage equilibrium Frequency
OOO 101 110 001 001 010 101 111 110 011 001 000 ,406
I l l 010 001 110 110 001 000 000 001 loo 110 111 .405
Others .I89
Run 2, N = BOO, started from complete coupling
I l l I l l 111 111 111 111 111 111 111 111 111 111 .MI
000 OOO 000 000 000 000 000 000 000 000 000 000 .380
Others .219
UNITS O F SELECTION 727
negligible. We cannot, however, continue to increase the number of loci indefi-
nitely while holding the effect of a gene substitution constant. Any realistic
view of the genome must suppose that as the number of segregating loci affecting
fitness increases, the effect of a single gene substitution must decrease.
Suppose we have a chromosome of some fixed map length 1. Moreover, suppose
that the finess of chromosomal homozygotes for this element is K as compared
to unity for a chromosomal heterozygote. (We are again assuming a symmetrical
model so that all chromosomal homozygotes have the same fitness). What will
happen as we pack more and more segregating genes into this fixed map length
with a fixed amount of inbreeding depression? Two compensatory effects will
occur. As more genes are packed into the linkage group, the recombination dis-
tance between adjacent loci will decrease linearly with gene number, but the
effect of a gene substitution on fitness will grow smaller. In a multiplicativemodel,
the fitness, I-s, of a homozygote at any one locus will be
1 - S = K ~ / ~or
In K
ln(1-s) =-
n
where n is the number of loci. For large n, 1-s will be very close to 1 so that the
effects of a single gene substitution will be
sz--
In K
n
That is, it will decrease linearly with n. The degree of epistasis as measured by
the deviation, E , from additivity is proportional to s2, however, so epistasis de-
creases as n2.Now 2-locus theory suggests that the effect of linkage and selection
as measured by correlation between loci depends upon the ratio E/.. If this ratio
is large either because epistasis is great or because recombination is rare, there
will be a large correlation between loci. If it falls below a critical value, there
will be no correlation at equilibrium. For the present case, then, E / . - n/n2 -
1Jn. Thus 2-locus theory predicts that as we increase the number of loci packed
into a fixed chromosome length with a fixed inbreeding depression, the correlation
between loci should decrease. This does not take into account, however, the
higher-order interactions. Are these sufficient to counterbalance the decrease in
effect, or will the importance of linkage grow vanishingly small?
To answer these questions we have carried out a large set of numerical calcula-
tions and simulations with two different values of K (.0225 and .4832),a range
of total map length up to 60 centimorgans and n = 2, 5,18,36, and 360.
The total map length is measured by
I= (n+l)r
This expression ignores the nonlinear relationship between map length and
recombination values. There will, however, only be a significant discrepancy
when we are dealing with large chromosome segments, and we are concerned
in this paper primarily with small recombination fractions between loci. We
728 I. FRANKLIN A N D R. C. LEWONTIN
For five loci, exact numerical computation was employed and p w a s directly
calculated over all pairs of loci. For 18, 36. and 360 loci, simulation was used
and F w a s estimated from the variance relation (3) given earlier in this paper.
The results of these calculations are shown in Figure 6a and 6b. In 6a, K =
0.0225 while in 6b, K = 0.4832. Table 7 gives the fitness per single locus homo-
zygous for these two cases with different numbers of loci from relation (13).
Attention is called to the extremely small fitness effects per single locus for large
numbers of loci.
Figures 6a and 6b show a remarkable phenomenon. As predicted by 2-locus
theory, the line relating p? to 1 has a negative slope and the 5-locus line lies
entirely beneath the 2-locus line. That is. increasing the number of loci in the
interval and sharing out the fitnejs among them does lessen the effect of the
linkage as measured by F f o r any given map length. When we pass from 5 to
18 loci, there is again a decrease in both cases but a much smaller one than when
passing from 2 to 5 loci. When the loci are now doubled to 36, there is only a
barely perceptible change in the line in the case of strong selection and a small
change for weak selection. Finally, for 360 loci there is no change at all for the
stronger selection, while for the weak selection the simulation gives such high
variance between generations that it was not possible to identify an equilibrium
condition. Thus we have a property of invariance appearing as the number of
TABLE 7
Fitness of a homozygote at a single locus among n loci when
the fitness of the n-tuple homozygote = K
. I
K
R ,0225 .4832
2 .I501 .6951
5 ,4683 3646
18 3100 ,9604
36 ,9000 .9800
360 .9895 ,9980
UNITS O F SELECTION 729
FIGURE 6.--The relation between the average correlation among all pairs of loci, on the
ordinate and total map length on the abscissa, for different numbers of loci making up the total
map. (a) Strong selection: K = 0.0225; (b) weak selection: K = 0.4832. Solid line: 2 loci; broken
line: 5 loci; dashed line: 18 loci (solid circles), 36 loci (crosses), 360 loci (open circles).
loci increases. For more than a couple of dozen loci, the average correldon be-
tween genes on the chromosome is independent of the number of genes or their
individual eflects and depends only on the total map length of the chromosomes
and the total inbreeding depression. Apparently, higher-order effects do come
into play in a significant way with many genes so as to just cancel out the weaken-
ing of first-order effects. Once again 2-locus theory is seen to be inadequate in an
important way for prediction of multilocus phenomena.
While we have shown that Fdepends only on two parameters, 1 and K , it does
730 I. FRANKLIN A N D R. C. LEWONTIN
not depend simply on the single parameter, selection per unit map length. For
-
example, from Figure 6 we see that for 36 loci in the case of strong selection,
p2 = .8 when homozygous load per centimorgan is .9775/9 = .109. The same
homozygous load per centimorgan
- for weak selection occurs for a map length of
4.7 centimorgans where p2 = 0 for 36 loci.
In the data presented in this paper the strong association between loci is a
result of the predominance in the gametic array of only two gametic types, and
this prediction does not appear to be compatible with the observed chromosome
variation in natural populations. There are several possible explanations for this
which are compatible with the above theory. One is that the whole chromosome
does not form a single super locus, but condenses into a series of complexes which
segregate more or less independently. Another factor, not introduced in the
simulations, is the possibility of multiple allelic overdominant systems, and these
offer the potential for a much greater array of chromosome types.
The final question we must concern ourselves with is a difficult one. Are
selection coefficients of sufficiently great magnitude to produce this phenomenon
in natural populations? It is difficult not only because we have little information
about an essential parameter of the system, namely the effect on fitness of
homozygosity for a segment of chromosome, but also we do not know if the multi-
plicative model is a realistic one for the interaction of loci affecting fitness. More-
over, we do not even know whether the most basic feature of all the models dis-
cussed, single-locus overdominance, is the rule in nature. All our models have
assumed single-locus overdominance in order that the gene frequencies will go
to some intermediate equilibrium. There is nothing about work on linkage either
in this paper or any previous paper that suggests that linkage alone can stabilize
gene frequencies in the absence of overdominance. In fact the reverse is true. It
has been shown, on the other hand, by LEWONTIN(1964b) that even in the
absence of overdominance the fixation of gene frequencies by selection may be
tremendously retarded by linkage. While overdominance at each locus has been
inserted in the present models simply in order to maintain genetic variance
indefinitely, we do not know at the present time how important it is for the effects
we have observed. For example, in the absence of overdominance but with ex-
tremely small selection coefficients favoring one homozygote over the other, the
change in gene frequencies might be extremely slow and all during the course of
transient polymorphism the striking linkage effects we have observed may occur.
It remains for further investigation to show how essential the assumption of over-
dominance is.
Let us consider as an example a chromosome arm in Drosophila with a map
length of 0.4 morgans. Because there is no crossing over in the male, this cor-
responds to 1 = 0.2 in the graphs. How much selection would be necessary to
ensure a high degree of linkage disequilibrium among all the loci on the arm, say,
with 7" 0.5? We can easily calculate the upper bound for K , using the formula
for two loci
SUMMARY
This study examines systems of 2, 5, 18, 36, and 360 segregating loci affecting
fitness. It chiefly deals with a model in which heterosis is multiplicative between
loci, but also deals with proportional selection, asymmetrical fitness selection
between alleles, and genes unequally spaced on the chromosome. The results
turn out to be robust to these variations in the model. The findings are that
(1) Multiple stable equilibria exist with different degrees of correlation between
loci.-(2) Even when effects of single allelic substitution on fitness are so small
that exact 2-locus theory predicts no effect of linkage, there is a strong correlation
between genes when many loci are segregating.-(3) When fitnesses are strongly
asymmetric at each locus, the equilibrium of gametic frequencies is such as to
make allelic frequencies nearly .5. Thus, allele frequencies at individual loci do
not reflect fitness effects at separate loci, but are a result of selection of the
chromosome as a whole.- (4) Higher-order interaction among loci in multilocus
systems increase in importance as the number of loci increases-(5) For a fixed
map length and a fixed amount of inbreeding depression per unit map length, the
equilibrium correlation among genes along the chromosome is independent of
734 I. FRANKLIN A N D R. C. LEWONTIN
the number of genes or their individual effects.-This last point makes it possible
to frame a theory of population genetics which does not contain individual loci
explicitly, but deals only with whole chromosomes, their recombination proper-
ties, and the effect of homozygosity of segments of various length. Such a theory
is more consonant with the observations possible in population genetics than a
theory framed in terms of gene frequencies.
LITERATURE CITED
BODMER, W. P. and J. FELSENSTEIN, 1967 Linkage and selection: Theoretical analysis of the
deterministic two locus random mating model. Genetics 57: 237-265.
BODMER, W. F. and P. PARSONS, 1962 Linkage and recombination in evolution. Advan. Genet.
11: 1-100.
FISHER, R. A., 1918 The correlation between relatives on the supposition of Mendelian inheri-
tance. Trans. Roy. Soc. Edinburgh 52: 399433. --, 1941 Average excess and average
effect of a gene substitution. Ann. Eugenics 11: 53-63.
FRASER, A., 1967 Gametic disequilibrium in multi-genic systems under normalizing selection.
Genetics 55: 507-512.
HILL,W. B. and A. ROBERTSON, 1968 Linkage disequilibrium in finite populations. Theoret.
Appl. Genet. 38: 226231.
KARLIN,S. and M. FELDMAN,1969 Linkage and selection: New equilibrium properties of the
two locus symmetric viability model. Proc. Natl. Acad. Sci. US. 62: 7@74.
KIMURA,M., 1965 Attainment of quasi-linkage equilibrium when gene frequencies are chang-
ing by natural selection. Genetics 52: 875-890.
KING, J. L., 1967 Continuously distributed factors affecting fitness. Genetics 55: 483A92.
KOJIMA,K., 1967 Likelihood of establishing newly induced inversion chromosomes in small
populations. Ciencia e Cultura 19: 67-77.
LEWONTIN,R. C., 1964a The interaction of selection and linkage. I. General considerations;
heterotic models. Genetics 49: 49-67. -, 1964b The interaction of selection and
linkage. 11. Optimum models. Genetics 50: 757-782.
LEWONTIN,R. C. and J. L. HUBBY,1966 A molecular approach to the study of genic hetero-
zygosity in natural populations. 11. Amount of variation and degree of heterozygosity in
natural populations of Drosophila pseudoobscura. Genetics 54 :595-609.
LEWONTIN,R. C. and K. KOJIMA,1950 The evolutionary dynamics of complex polymorphisms.
Evolution 14: 458472.
MAYR,E., 1963 Animal Species and Evolution. Harvard Univ. Press. Cambridge, Mass.
MILKMAN,R. D., 1967 Heterosis as a major cause of heterozygosity in nature. Genetics 55:
493-495.
OHTA,T. and M. KIMURA,1969 Linkage disequilibrium due to random genetic drift. Genet.
Res. 13: 47-55.
PRAKASH, S., R. C. LEWONTINand J. L. HUBBY, 1969 A molecular approach to the study of
genic heterozygosity in natural populations. IV. Patterns of genic variation in central
marginal, and isolated populations of Drosophila pseudoobscura. Genetics 61 : 841-858.
SVED,J. A., 1968 The stability of linked systems of loci with small population size. Genetics
59: 543-563.
SVED,J. A., E. REEDand W. J. BODMER, 1967 The numbers of balanced polymorphisms that can
be maintained in a natural population. Genetics 55: 469481.
WILLS,C., J. CRENSHAW and J. VITALE,1970 A computer model allowing maintenance of large
amounts of genetic variability in Mendelian populations. I. Assumptions and results for
large populations. Genetics 64: 107-123.
WRIGHT,S., 1967 Surfaces of selective value. Proc. Natl. Acad. Sci. U.S. 58: 165-172.