Professional Documents
Culture Documents
1517 Hardy-Weinberg Principle
1517 Hardy-Weinberg Principle
1517 Hardy-Weinberg Principle
1
2 2 DEVIATIONS FROM HARDY–WEINBERG EQUILIBRIUM
The sum of the entries is p2 + 2pq + q2 = 1, as the geno- As before, one can show that the allele frequencies at time
type frequencies must sum to one. t+1 equal those at time t, and so, are constant in time.
Note again that as p + q = 1, the binomial expansion of (p Similarly, the genotype frequencies depend only on the
+ q)2 = p2 + 2pq + q2 = 1 gives the same relationships. allele frequencies, and so, after time t=1 are also constant
in time.
Summing the elements of the Punnett square or the bi-
nomial expansion, we obtain the expected genotype pro- If in either monoecious or dioecious organisms, either
the allele or genotype proportions are initially unequal
portions among the offspring after a single generation:
in either sex, it can be shown that constant proportions
are obtained after one generation of random mating. If
dioecious organisms are heterogametic and the gene locus
is located on the X chromosome, it can be shown that if
the allele frequencies are initially unequal in the two sexes
[e.g., XX females and XY males, as in humans], f′(a) in
the heterogametic sex ‘chases’ f(a) in the homogametic
These frequencies define the Hardy–Weinberg equilib- sex of the previous generation, until an equilibrium is
rium. It should be mentioned that the genotype frequen- reached at the weighted average of the two initial frequen-
cies after the first generation need not equal the genotype cies.
frequencies from the initial generation, e.g. f 1 (AA) ≠
f 0 (AA). However, the genotype frequencies for all fu-
ture times will equal the Hardy–Weinberg frequencies,
e.g. ft(AA) = f 1 (AA) for t > 1. This follows since the
2 Deviations from Hardy–
genotype frequencies of the next generation depend only Weinberg equilibrium
on the allele frequencies of the current generation which,
as calculated by equations (1) and (2), are preserved from The seven assumptions underlying Hardy–Weinberg
the initial generation: equilibrium are as follows:[3]
[(AA, AA), (AA, Aa), (AA, aa), (Aa, Aa), (Aa, aa), (aa, aa)] Violations of the Hardy–Weinberg assumptions can cause
and constructs a Punnett square for each, so as to calcu- deviations from expectation. How this affects the popu-
late its contribution to the next generation’s genotypes. lation depends on the assumptions that are violated.
These contributions are weighted according to the prob-
ability of each diploid-diploid combination, which fol- • Random mating. The HWP states the population
lows a multinomial distribution with k = 3. For example, will have the given genotypic frequencies (called
the probability of the mating combination (AA,aa) is 2 Hardy–Weinberg proportions) after a single genera-
ft(AA)ft(aa) and it can only result in the Aa genotype: tion of random mating within the population. When
[0,1,0]. Overall, the resulting genotype frequencies are the random mating assumption is violated, the popu-
calculated as: lation will not have Hardy–Weinberg proportions. A
common cause of non-random mating is inbreeding,
which causes an increase in homozygosity for all
[ft+1 (AA), ft+1 (Aa), ft+1 (aa)] =
[ ] genes.
= ft (AA)ft (AA) [1, 0, 0] + 2ft (AA)ft (Aa) 12 , 12 , 0 + 2ft (AA)ft (aa) [0, 1, 0]
[ ] [ ]
+ ft (Aa)ft (Aa) 41 , 12 , 14 + 2ft (Aa)ft (aa) 0, If12 , a12 population
+ ft (aa)ftviolates one
(aa) [0, 0, 1] of the following four as-
[( ) ( ) (
sumptions, the ) (
population may )2 ]have Hardy–
continue to
2
= ft (AA) + 21 ft (Aa) , 2 ft (AA) + 12 ft (Aa) ft (aa) + 12 ft (Aa) , ft (aa) + 21 ft (Aa)
Weinberg proportions each generation, but the allele fre-
[ ]
= ft (A)2 , 2ft (A)ft (a), ft (a)2 quencies will change over time.
4.1 Generalization for more than two alleles 3
• Selection, in general, causes allele frequencies to 4.1 Generalization for more than two alle-
change, often quite rapidly. While directional selec- les
tion eventually leads to the loss of all alleles except
the favored one, some forms of selection, such as
balancing selection, lead to equilibrium without loss
of alleles.
• Migration genetically links two or more populations Punnett square for three-allele case (left) and four-allele case
together. In general, allele frequencies will become (right). White areas are homozygotes. Colored areas are het-
more homogeneous among the populations. Some erozygotes.
models for migration inherently include nonrandom
mating (Wahlund effect, for example). For those Consider an extra allele frequency, r. The two-allele case
models, the Hardy–Weinberg proportions will nor- is the binomial expansion of (p + q)2 , and thus the three-
mally not be valid. allele case is the trinomial expansion of (p + q+ r)2 .
is 3.84, and since the χ2 value is less than this, the null For two alleles, the chi-squared goodness of fit test for
hypothesis that the population is in Hardy–Weinberg fre- Hardy–Weinberg proportions is equivalent to the test for
quencies is not rejected. inbreeding, F = 0.
The inbreeding coefficient is unstable as the expected
value approaches zero, and thus not useful for rare and
6.2 Fisher’s exact test (probability test)
very common alleles. For: E = 0, O > 0, F = −∞ and E
= 0, O = 0, F is undefined.
Fisher’s exact test can be applied to testing for Hardy–
Weinberg proportions. Since the test is conditional on the
allele frequencies, p and q, the problem can be viewed as
testing for the proper number of heterozygotes. In this 8 History
way, the hypothesis of Hardy–Weinberg proportions is
rejected if the number of heterozygotes is too large or too Mendelian genetics were rediscovered in 1900. However,
small. The conditional probabilities for the heterozygote, it remained somewhat controversial for several years as it
given the allele frequencies are given in Emigh (1980) as was not then known how it could cause continuous char-
acteristics. Udny Yule (1902) argued against Mendelism
because he thought that dominant alleles would increase
( )
n in the population.[5] The American William E. Castle
n11 ,n12 ,n22
prob[n12 |n1 ] = ( 2n ) 2 , n12
(1903) showed that without selection, the genotype fre-
n1 ,n2 quencies would remain stable.[6] Karl Pearson (1903)
found one equilibrium position with values of p = q
where n11 , n12 , n22 are the observed numbers of the three
= 0.5.[7] Reginald Punnett, unable to counter Yule’s
genotypes, AA, Aa, and aa, respectively, and n1 is the
point, introduced the problem to G. H. Hardy, a British
number of A alleles, where n1 = 2n11 + n12 .
mathematician, with whom he played cricket. Hardy was
An example Using one of the examples from Emigh a pure mathematician and held applied mathematics in
(1980),[4] we can consider the case where n = 100, and some contempt; his view of biologists’ use of mathemat-
p = 0.34. The possible observed heterozygotes and their ics comes across in his 1908 paper where he describes
exact significance level is given in Table 4. this as “very simple":[8]
Using this table, one must look up the significance level
of the test based on the observed number of heterozy- To the Editor of Science: I am reluctant to
gotes. For example, if one observed 20 heterozygotes, intrude in a discussion concerning matters of
the significance level for the test is 0.007. As is typical which I have no expert knowledge, and I should
for Fisher’s exact test for small samples, the gradation of have expected the very simple point which I
significance levels is quite coarse. wish to make to have been familiar to biologists.
However, some remarks of Mr. Udny Yule, to
However, a table like this has to be created for every ex- which Mr. R. C. Punnett has called my atten-
periment, since the tables are dependent on both n and tion, suggest that it may still be worth making...
p.
Suppose that Aa is a pair of Mendelian char-
acters, A being dominant, and that in any given
7 Inbreeding coefficient generation the number of pure dominants (AA),
heterozygotes (Aa), and pure recessives (aa)
are as p:2q:r. Finally, suppose that the num-
The inbreeding coefficient, F (see also F-statistics), is one
bers are fairly large, so that mating may be re-
minus the observed frequency of heterozygotes over that
garded as random, that the sexes are evenly dis-
expected from Hardy–Weinberg equilibrium.
tributed among the three varieties, and that all
are equally fertile. A little mathematics of the
E (f (Aa))−O(f (Aa)) O(f (Aa))
F = E(f (Aa)) =1− E(f (Aa)) , multiplication-table type is enough to show that
in the next generation the numbers will be as (p
where the expected value from Hardy–Weinberg equilib- + q)2 :2(p + q)(q + r):(q + r)2 , or as p1 :2q1 :r1 ,
rium is given by say.
The principle was thus known as Hardy’s law in the Where the final equality holds because the allele propor-
2
English-speaking world until 1943, when Curt Stern tions must sum to one. In both cases, qt−1 = pt−1 rt−1
pointed out that it had first been formulated indepen- . It can be shown that the other two equilibrium condi-
dently in 1908 by the German physician Wilhelm Wein- tions imply the same equation. Together, the solutions
berg.[9][10] William Castle in 1903 also derived the ra- of the three equilibrium equations imply sufficiency of
tios for the special case of equal allele frequencies, and Hardy’s condition for equilibrium. Since the condition al-
it is sometimes (but rarely) called the Hardy–Weinberg– ways holds for the second generation, all succeeding gen-
Castle Law. erations have the same proportions.
Hardy’s statement begins with a recurrence relation for An example computation of the genotype distribution
the frequencies p, 2q, and r. These recurrence relations given by Hardy’s original equations is instructive. The
follow from fundamental concepts in probability, specif- phenotype distribution from Table 3 above will be used
ically independence, and conditional probability. For ex- to compute Hardy’s initial genotype distribution. Note
ample, consider the probability of an offspring from the that the p and q values used by Hardy are not the same as
generation t being homozygous dominant. Alleles are in- those used above.
herited independently from each parent. A dominant al-
lele can be inherited from a homozygous dominant par-
ent with probability 1, or from a heterozygous parent with sum = obs(AA) + 2 × obs(Aa) + obs(aa) = 1469 + 2 × 138 + 5
probability 0.5. To represent this reasoning in an equa- = 1750
tion, let At represent inheritance of a dominant allele 1469
from a parent. Furthermore, let AAt−1 and Aat−1 rep- p= = 0.83943
1750
resent potential parental genotypes in the preceding gen- 2 × 138
eration. 2q = = 0.15771
1750
5
r= = 0.00286
2 1750
pt = P (At , At ) = P (At )
As checks on the distribution, compute
2
= (P (At |AAt−1 )P (AAt−1 ) + P (At |Aat−1 )P (Aat−1 ))
2
= ((1)pt−1 + (0.5)2qt−1 )
p + 2q + r = 0.83943 + 0.15771 + 0.00286 = 1.00000
2
= (pt−1 + qt−1 )
and
The same reasoning, applied to the other genotypes yields
the two remaining recurrence relations. Equilibrium oc-
curs when each proportion is constant between subse- E0 = q − pr = 0.00382.
2
quent generations. More formally, a population is at equi- For the next generation, Hardy’s equations give
librium at generation t when
First consider the case, where pt−1 = 0 , and note that it and
implies that qt−1 = 0 and rt−1 = 1 . Now consider the
remaining case, where pt−1 ≠ 0
E1 = q12 − p1 r1 = 0.00000
which are the expected values. The reader may demon-
2
0 = pt−1 (pt−1 + 2qt−1 + qt−1 /pt−1 − 1) strate that subsequent use of the second-generation values
2
= qt−1 /pt−1 − rt−1 for a third generation will yield identical results.
7
9 Graphical representation [3] Hartl DL, Clarke AG (2007) Principles of population ge-
netics. Sunderland, MA: Sinauer
• Multinomial distribution (Hardy–Weinberg is a tri- • Ford, E.B. (1971). Ecological Genetics, London.
nomial distribution with probabilities (θ2 , 2θ(1 −
θ), (1 − θ)2 ) ) • Guo, Sw; Thompson, Ea (Jun 1992). “Per-
forming the exact test of Hardy–Weinberg pro-
portion for multiple alleles”. Biometrics. Bio-
metrics, Vol. 48, No. 2. 48 (2): 361–72.
11 References doi:10.2307/2532296. ISSN 0006-341X. JSTOR
2532296. PMID 1637966.
[1] The term frequency usually refers to a number or count,
but in this context, it is synonymous with probability. • Hardy, G. H. (Jul 1908). “Mendelian Proportions in
a Mixed Population” (PDF). Science. 28 (706): 49–
[2] http://www.mun.ca/biology/scarr/2900_HW_for_ 50. doi:10.1126/science.28.706.49. ISSN 0036-
dioecious.html 8075. PMID 17779291.
8 13 EXTERNAL LINKS
13 External links
• EvolutionSolution (at bottom of page)
14.2 Images
• File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: PD Contributors: ? Origi-
nal artist: ?
• File:De_finetti_diagram.png Source: https://upload.wikimedia.org/wikipedia/commons/5/52/De_finetti_diagram.png License: Public
domain Contributors: Own work Original artist: Genisock2
• File:Hardy-Weinberg.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/b2/Hardy-Weinberg.svg License: CC BY-SA
3.0 Contributors: Own work Original artist: Johnuniq
• File:Hardy–Weinberg_law_-_Punnett_square.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/4d/Hardy%E2%80%
93Weinberg_law_-_Punnett_square.svg License: CC0 Contributors:
• File:Schemat punneta2.svg Original artist: own
• File:Hardy–Weinberg_law_-_Punnett_square2.svg Source: https://upload.wikimedia.org/wikipedia/commons/2/25/Hardy%E2%
80%93Weinberg_law_-_Punnett_square2.svg License: CC0 Contributors:
• File:Schemat punneta2.svg Original artist: own
• File:Lock-green.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/65/Lock-green.svg License: CC0 Contributors: en:File:
Free-to-read_lock_75.svg Original artist: User:Trappist the monk