Professional Documents
Culture Documents
Slides 404 chp08 2023
Slides 404 chp08 2023
1
outline
• the drift process • mutation, drift and
– the Wright-Fisher model selection
– change in heterozygosity • the neutral theory
– effective population size – why most substitutions
• the coalescent are neutral
– randomness of gene – the null model of
genealogies molecular evolution
• drift and demographic – (tests for positive
selection)
history
– bottlenecks and founder • the molecular clock
effects – variability in clock rates
2
loss of heterozygosity during
out-of-Africa
3
http://www.sciencemag.org/cgi/content/abstract/319/5866/1100
loss of heterozygosity during
out-of-Africa
4
http://www.sciencemag.org/cgi/content/abstract/319/5866/1100
why do species differences
accumulate linearly with time?
5
weird alleles in isolated
populations
6
weird alleles in isolated
populations
7
http://slideplayer.com/slide/4659685
weird alleles in isolated
populations
• Ellis-van Creveld syndrome: dwarfism &
polydactyly
• 1/100,000
• but 6% in US Amish
8
https://www.nature.com/articles/ng0300_203
weird alleles in isolated
populations
9
Wright-Fisher populations
• like HW, no selection, no mutation, no
migration, random mating…
10
Wright-Fisher populations
11
Wright-Fisher populations
• random sampling with replacement
• some alleles chosen multiple times (because
we assume the infinite gamete pool)
• and some never
12
WF exercise
• exercise: if N = 10 and pA1 = 0.1
– i.e. single copy
• P [loss of A1] in one generation ?
13
genetic drift
• random fluctuation of allele freqs across
generations
14
genetic drift
• drift, the sole force on neutral alleles
– causes increase & decrease, loss or fixation
– eventually one of the neutral alleles will fix
– but fixation by drift will take longer than by
selection
• but affects all alleles (neutral or under
selection), weakly or strongly
– depending on s (selection) and N (pop size)
15
genetic drift: random changes
https://www.quora.com/What-is-a-
16
random-walk
drifting variants
17
Hamilton, Pop. Gen.
drift experiments in a small
population
19
what is expected p' ?
• expectation = average of all
possibilities
• expected p' = p
• both alleles have equal
chance to increase or
decrease
21
what is the probability of fixation
by drift?
• P[fixation of A1] = frequency of A1
22
exercise: fixation probability of a
neutral allele
• a neutral allele at frequency 70% à what is its
probability of fixation?
• a beneficial allele at frequency 70% à what is
its probability of fixation (more or less than
70%)?
23
drift, inbreeding, heterozygosity
• in the long term: drift decreases H, increases
homozygosity à causes loss or fixation of
alleles
• in the short term: drift can lead to temporary
increase in H
25
drift, inbreeding, heterozygosity
• assume no drift ~ inbreeding in the past
• assume all 2N alleles unique à Ft=0 = 0, then?
• Ft=1 = 1/2N
• à chance that the same allele is chosen twice
26
drift, inbreeding, heterozygosity
• imagine a WF pop of 2 diploid indv, with 4 alleles:
• A1, A2, A3, A4
• F = chance of an individual being homozygous
• Ft=0 = 0
• to create the next generation, imagine an infinite
gamete pool, each allele represented by its
frequency
• if we choose one A1, what is the chance we
choose A1 again to create a homozygous
individual? à Ft=1 = 1/4
27
drift, inbreeding, heterozygosity
• so the expected Ft=1 = 1/2N
• the empirical Ft=1 will vary around this value
29
drift, inbreeding, heterozygosity
• F increases by 1/2N each generation via drift
30
drift, inbreeding, heterozygosity
• so, offspring F:
• F' = (1 - 1/2N)*F + 1/2N
33
why is drift important?
• drift changes allele freqs all the time
• drift signatures inform about past population
sizes = demographic history
• drift shapes population and species
divergence at the molecular level, and
possibly also at the phenotypic level
• drift can facilitate speciation
34
why is drift important?
• drift can change allele freqs against selection,
which may be harmful for a pop:
• e.g. losing recessive beneficial alleles, or fixing
slightly deleterious variants
• drift can explain important biological
phenomena, like ageing
• drift can also help avoid getting stuck at local
adaptive fitness peaks
35
how to measure drift?
36
effective population size (Ne)
40
past bottlenecks reduce Ne
• when N fluctuates in time (m generations) à
long term Ne determined by the harmonic
mean à influenced by smaller values
42
Hamilton, Population Genetics
past bottlenecks reduce Ne
• if N = 100,000 for 95 generations, and N = 50
for only 5 generations, what is Ne?
43
sex-bias in breeding reduces Ne
• Ne also reflects the average of male and
female breeder numbers
• Ne = (4*300*10)/310 ~ 40
45
drift and pop divergence
• if two populations split (become isolated),
how will drift affect their similarity?
• genetic drift experiment – how to design?
46
drift and pop divergence
• drift experiment with fruit fly
• start at 50% freq of a white eye allele
• randomly choose 8 males + 8 females in
each population
• change?
47
drift and pop divergence
48
drift and pop divergence
49
microsatellite (STR) diversity in
Galapagos lizards
50
microsatellite (STR) diversity in
Galapagos lizards
54
coalescence
55
coalescence
• if we go far enough back, how many ancestors
of a locus with n=10 copies today?
56
coalescence
• if we go far enough back, how many ancestors
of a locus with n=10 copies today?
• just one
– remember that at each locus, one allele will
eventually fix
– e.g. human “mitochondrial Eve” lived about 200k
years ago
– how about human "Y-chr Adam"?
57
average time to coalescence?
• the average time to coalesce for a pair of
gene copies in a diploid species with pop size
N?
58
average time to coalescence?
• coalescence is a Poisson event
• = coalescence probabilities modeled by the
Poisson distribution
• à average waiting times for coalescence are
exponentially distributed
59
average time to coalescence?
• random event with probability P à average
waiting time = 1/P
– e.g. 2 individuals having the same birthday: 1/365
à on average you need to find 365 individuals to
find a pair with same birthday
• probability of 2 alleles having the same
ancestor in the previous generation = 1/2N
• assuming fixed population size
• average waiting time for coalescence = 2N
generations 60
average time to coalescence
• the average time to coalesce for a pair of gene
copies in a diploid species with pop size N:
• 2N
61
average time to coalescence
• waiting time shortens with more gene copies
= possible combinations that may coalesce
within a certain time
62
average time to coalescence
• the average time to coalesce for all gene
copies in a diploid species with pop size N:
63
average time to coalescence
• comparing between pop with small vs. large
Ne à coalescence faster (= fixation faster)
with small Ne (i.e. faster drift)
64
coalescence times for different loci
• simulations under the same Ne:
https://journals.plos.org/plosgenetics/article/figure?id=10.1 66
371/journal.pgen.0020068.g001
how many differences between 2 gene
copies at a neutral locus?
68
how many differences between 2
gene copies at a neutral locus?
69
how many differences between 2
gene copies at a neutral locus?
70
how many differences between 2
gene copies at a neutral locus?
• μ = mutation rate per generation
• expected number of mutations between 2
copies at a neutral diploid locus
= time x mutation rate
= 4Nμ
• so, higher Ne à higher diversity
(heterozygosity)
• given diversity & μ à we can estimate Ne
71
coalescence / drift at different loci
• in eukaryotes doing meiosis, each locus has a
different drift history
– we are all mosaics of ancestral genes
• the history of loci with different transmission
modes can be even more different
72
coalescence / drift at different loci
• assuming equal numbers of males and
females:
• autosomal Ne ~ 4 * mitochondrial Ne
• X chr Ne ~ 3 * mitochondrial Ne
• slowest drift = longer coalescence times at
autosomes
• faster drift = shorter coalescence times in
mitochondria à less diversity within pop,
more divergence between pop
73
coalescence / drift at different loci
• e.g. human “mitochondrial Eve” (most recent
common ancestor) lived ~ 200kya
• no recombination
75
coalescence, mutation, selection
• coalescence fast under positive selection
(relative to neutral loci)
– faster coalescence = recent ancestry = fewer
variants can accumulate (relative to neutral) à
regions under positive selection will have less
genetic diversity (relative to neutral)
• coalescence slow under balancing selection
(relative to neutral loci)
– more polymorphism than neutral
76
bottlenecks & founder effects
• cases of extreme drift
• reduce diversity in the population
• all copies coalesce at bottleneck time
• lost diversity cannot be regained for long
periods, even if pop size increases
– because mutation is a slow event
77
bottlenecks & founder effects
78
bottlenecks & founder effects
79
bottlenecks & founder effects
80
hunted seals
82
bottlenecks & founder effects
83
bottlenecks can support speciation
rapid
coalescence at
bottleneck
loss of self-
incompatibility
à bottleneck à
differentiation
à speciation
84
https://www.pnas.org/content/106/13/5246
human migrations out-of-Africa
85
http://www.sciencemag.org/cgi/content/abstract/319/5866/1100
human migrations out-of-Africa
X
X
86
Zimmer 2013 The Tangled Bank
• out-of-Africa
branches
longer than
African
branches
• Americas even
longer
• why?
http://www.sciencemag.org/cgi/content/ab
87
stract/319/5866/1100
• drift: alleles fix
during
founder
effects à
differentiate
those
populations
from others
http://www.sciencemag.org/cgi/content/ab
88
stract/319/5866/1100
founder effects in spruce
89
founder effects in spruce
• mtDNA
diversity
limited in
newly
colonized
regions
91
mutation-drift equilibrium
• F = probability of IBD w/o mutation
• every generation:
• F increases by 1/2N by drift
• F decreases by μ (mutation)
• F at equilibrium?
92
mutation-drift equilibrium
• F at equilibrium?
93
mutation-drift equilibrium
• at equilibrium: Foffspring = Fparental
94
mutation-drift equilibrium
• probability of homozygosity of 2 gene copies:
!
𝐹equilibrium =
"#$%!
– (can also be calculated as the probability of
coalescence happening before mutation)
• expected number of differences between 2
gene copies: 𝐻 = 4𝑁𝜇
"#$
• Heterozygosity = 1 − 𝐹equilibrium = "#$%!
• as 4𝑁𝜇 à 0, Heterozygosity à 4𝑁𝜇
95
selection vs. drift
• fate of new alleles (= rare alleles): determined
by drift, even if beneficial
96
selection vs. drift
• fixation probability P of a
single copy beneficial (and
partial dominant) allele, in a
large population (JBS
Haldane)
• fitness = 1 + s
• P[fixation of beneficial
allele] = 2s
97
selection vs. drift
• P[fixation of a de novo beneficial allele] = 2s
• if s = 5% à P = 10%
• if s = 1% à P = 2%
• so, many beneficial alleles lost at early stage
• even more lost if beneficial alleles are
recessive
99
selection vs. drift
starting at
1% freq
100
selection vs. drift
101
selection or drift?
102
selection or drift?
103
a neutral explanation
104
a neutral explanation
105
fitness effects
these can drift to high
frequencies and contribute to
polymorphism and fixation
106
neutral theory of molecular evolution
• in nature, molecular diversity & divergence
mainly shaped by:
– mutation
– negative selection
– neutral drift
– directional selection also contributes to
divergence, and balancing selection contributes to
diversity, but both at lower levels than mutation
and drift
• how come most substitutions neutral?
107
how come most substitutions neutral?
• in many eukaryotes most DNA is non-coding
& non-functional
• coding region variants can be synonymous
(degeneracy of the code)
• functional (non-synonymous or regulatory)
variants may have no effect on fitness
• selection can be ineffective when Ne and/or s
are small
– e.g. deleterious mutations can behave like
neutral 108
most substitutions synonymous
110
many non-synonymous changes can
also be neutral
• even many non-synonymous changes can be
neutral
• human-rat-chicken
comparison of
TRPV
111
how come most substitutions neutral?
112
strength of selection depends on 2Nes
113
strength of selection depends on 2Nes
115
neutral theory of molecular evolution
116
molecular clock
• 1960s:
• molecular divergence ~ divergence times
estimated from the fossil record
117
molecular clock
• 1960s:
• equidistance principle:
outgroup-A distance =
outgroup-B distance
• à rate of substitution
similar among
branches, in general
118
molecular clock
• equidistance principle
119
molecular clock
• large populations gain
more mutations per gen
• small populations fix
more mutations (by drift)
per gen
• remember the case of
founder effects in human
populations
120
molecular clock
• k neutral loci, N pop size, μ mutation rate
• 2Nkμ neutral mutations arise each gen
• 1/2N fix
• the rate of new alleles becoming fixed:
• substitution rate = 2Nkμ * 1/2N = kμ
• N not a factor in neutral substitution rate à
molecular clock
121
molecular clock
• use mutation rate per time for a DNA/protein
sequence à calculate the time since the most
recent common ancestor (TMRCA) of 2 taxa
124
does the clock always tick
perfectly?
• early studies: protein divergence appears
linear
126
some genome regions have higher
mutation rate
127
loci under purifying selection
evolve slower
• influenza A: the effect of negative selection
128
loci under positive selection evolve
faster
• two loci in HIV – which locus is under pos and
which one negative selection?
129
sequence divergence may saturate
with time
130
transitions saturate faster than
transversions - why?
131
which loci to use for
molecular clock analyses?
132
which loci to use for
molecular clock analyses?
133
molecular clock
• clock rate will vary, possibly depending on:
o mutation rate
o selection (both negative & positive)
o saturation (time since divergence)
o generation time
134
applications to experiment with
• drift and selection
• https://cjbattey.shinyapps.io/driftR/
– try drift, directional selection, balancing
selection, drift and selection, study time to
fixation of a neutral allele
• coalescence
• https://cdmuir.shinyapps.io/coalescence/
135
additional slides
136
generation time and the molecular
clock
• rodent branches not 50x longer than primate
branches – why?
137
http://www.nature.com/nature/journal/v437/n7055/full/nature04072.html
generation time and the molecular
clock
• divergence correlated with more time (years)
than with number of generations
138
generation time and the molecular
clock
• generation time has limited effect – why?
• nearly neutral theory (Ohta)
139
generation time and the molecular
clock
140
generation time and the molecular
clock
• in species with long gen time + small N:
• slightly deleterious mutations à effectively
neutral à can fix and accumulate by drift
142
WF model simulations in R
143
WF model simulations in R
144
WF model simulations in R
N <- 1000
p <- 0.2
gens <- 2000
x <- rep(c(1,0), times = c(p,(1-p))*N)
freqs <- mean(x)
for (i in 2:gens) {
x <- sample(x, rep=T)
freqs[i] <- mean(x)
}
plot(freqs, type="l", ylim=c(0,1), xlab="Generation", ylab="Frequency",
main=paste("p =", p, "; N =", N))
145
WF model simulations in R
N <- 300
p <- 0.2
gens <- 2000
sims <- 20
sim <- sapply(1:sims, function(j) {
x <- rep(c(1,0), times = c(p,(1-p))*N)
freqs = mean(x)
for (i in 2:gens) {
x <- sample(x, rep=T)
freqs[i] <- mean(x)
}
return(freqs)
})
146
WF model simulations in R
matplot(sim, type="l", ylim=c(0,1), xlab="Generation",
ylab="Frequency", main=paste(sims, "trials ; p =", p, "; N =", N))
147